Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- From: Gregory Haskins <gregory.haskins@xxxxxxxxx>
- Date: Tue, 06 Oct 2009 12:58:06 -0400
Avi Kivity wrote:
On 10/06/2009 03:31 PM, Gregory Haskins wrote:
slots would be one implementation, if you can think of others then you'dI'm more interested in *how* you'd add them more than "if" we would add
add them.
them. What I am getting at are the logistics of such a beast.
Add alternative ioctls, or have one ioctl with a 'type' field.
For instance, would I have /dev/slots-vas with ioctls for adding slots,
and /dev/foo-vas for adding foos? And each one would instantiate a
different vas_struct object with its own vas_struct->ops? Or were you
thinking of something different.
I think a single /dev/foo is sufficient, unless some of those address
spaces are backed by real devices.
If you can't, I think it indicates that the whole thing isn't necessaryI'm not sure if we are talking about the same thing yet, but if we are,
and we're better off with slots and virtual memory.
there are uses of a generalized interface outside of slots/virtual
memory (Ira's physical box being a good example).
I'm not sure Ira's case is not best supported by virtual memory.
Perhaps, but there are surely some cases where the memory is not
pageable, but is accessible indirectly through some DMA controller. So
if we require it to be pagable we will limit the utility of the
interface, though admittedly it will probably cover most cases.
In any case, I think the best approach is what I already proposed.
KVM's arrangement of memory is going to tend to be KVM specific, and
what better place to implement the interface than close to the kvm.ko
core.
The only thing missing is dma, which you don't deal with anyway.Afaict I do support dma in the generalized vbus::memctx, though I do not
use it on anything related to KVM or xinterface. Can you elaborate on
the problem here? Does the SG interface in 4/4 help get us closer to
what you envision?
There's no dma method in there. Mapping to physical addresses is
equivalent to get_user_pages(), so it doesn't add anything over virtual
memory slots.
I am not following you at all. What kind of interface do we need for
DMA? Since the KVM guest is pagable, the support for DMA would come
from building a scatterlist (patch 4/4) and passing the resulting pages
out using standard sg mechanisms, like sg_dma_address(). This is what I
do today to support zero-copy in AlacrityVM.
What more do we need?
You'd have to copy the entire range since you don't know what the guest? The vmap is presumably part of an ABI between guest and host, so the
might put there. I guess it's acceptable for small areas.
host should always know what structure is present within the region, and
what is relevant from within that structure to migrate once that state
is "frozen".
I was thinking of the case where the page is shared with some other
(guest-private) data. But dirtying that data would be tracked by kvm,
so it isn't a problem.
Ok.
These regions (for vbus, anyway) equate to things like virtqueue
metadata, and presumably the same problem exists for virtio-net in
userspace as it does here, since that is another form of a "vmap". So
whatever solution works for virtio-net migrating its virtqueues in
userspace should be close to what will work here. The primary
difference is the location of the serializer.
Actually, virtio doesn't serialize this data, instead it marks the page
dirty and lets normal RAM migration move it.
Ok, so its effectively serializing the entire region indirectly. That
works too.
rmb()s are only needed if an external agent can issue writes, otherwiseI was following lessons learned here:
you'd need one after every statement.
http://lkml.org/lkml/2009/7/7/175
Perhaps mb() or barrier() are more appropriate than rmb()? I'm CC'ing
David Howells in case he has more insight.
I'm not sure I understand the reference. Callers and callees don't need
memory barriers since they're guaranteed program order. You only need
memory barriers when you have an external agent (a device, or another
cpu). What external agent can touch things during ->release()?
We already have a different subthread started on this, so I'll pick this
up there.
It primarily assumes a low _migration_ rate, since you do not typicallyThis just assumes a low context switch rate.A simple per-vcpu cache (in struct kvm_vcpu) is likely to give betterper-vcpu will not work well here, unfortunately, since this is an
results.
external interface mechanism. The callers will generally be from a
kthread or some other non-vcpu related context. Even if we could
figure
out a vcpu to use as a basis, we would require some kind of
heavier-weight synchronization which would not be as desirable.
Therefore, I opted to go per-cpu and use the presumably lighterweight
get_cpu/put_cpu() instead.
have two contexts on the same cpu pounding on the memslots.
Why not? If you're overcommitted, you will, especially if you have
multiple guests doing request/reply interaction (perhaps with each other).
Right, but you can probably do a *lot* of gfn_to_memslot() calls within
your timeslice to still make it highly worthwhile. We are talking
microseconds per lookup vs milliseconds per context switch.
And even if
you did, there's a good chance for locality between the threads, since
the IO activity is likely related. For the odd times where locality
fails to yield a hit, the time-slice or migration rate should be
sufficiently infrequent enough to still yield high 90s hit rates for
when it matters. For low-volume, the lookup is in the noise so I am not
as concerned.
IOW: Where the lookup hurts the most is trying to walk an SG list, since
each pointer is usually within the same slot. For this case, at least,
this cache helps immensely, at least according to profiles.
I'm wary of adding per-cpu data where it's easier to have a per-caller
or per-vcpu cache.
I like your per-caller suggestion. That makes the most sense to me.
How about a gfn_to_pfn_cached(..., struct gfn_to_pfn_cache *cache)?Sounds good. I will incorporate this into the split patch.
Each user can place it in a natural place.
Note that for slots, typically most accesses hit just one slot (the
largest). For guests below 4G, there's only one large slot, for guests
above 4G there are two.
Right, I find that the cache misses are pretty infrequent so that jives
with your statement.
I am not an mm expert, but iiuc you cannot call switch_to() fromStill, why can't you switch temporarily?Thats actually what I do for the fast-path (use_mm() does a switch_to()+static unsigned longCan't you switch the mm temporarily instead of this?
+xinterface_copy_to(struct kvm_xinterface *intf, unsigned long gpa,
+ const void *src, unsigned long n)
+{
+ struct _xinterface *_intf = to_intf(intf);
+ unsigned long dst;
+ bool kthread = !current->mm;
+
+ down_read(&_intf->kvm->slots_lock);
+
+ dst = gpa_to_hva(_intf, gpa);
+ if (!dst)
+ goto out;
+
+ if (kthread)
+ use_mm(_intf->mm);
+
+ if (kthread || _intf->mm == current->mm)
+ n = copy_to_user((void *)dst, src, n);
+ else
+ n = _slow_copy_to_user(_intf, dst, src, n);
internally).
The slow-path is only there for completeness for when switching is not
possible (such as if called with an mm already active i.e.
process-context).
anything other than kthread context. Thats what the doc says, anyway.
Can you provide a pointer? I don't see why this limitation exists.
See the comments for "use_mm()"
http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=mm/mmu_context.c;h=ded9081f40219d888e48e41d4b665a712a188d28;hb=HEAD
Actually, no. Before Michael enlightened me recently regardingIn practice, however, this doesnt happen. VirtuallySo you have 100% untested code here.
100% of the calls in vbus hit the fast-path here, and I suspect most
xinterface clients would find the same conditions as well.
switch_to/use_mm, the '_slow_xx' functions were my _only_ path. So they
have indeed multiple months (and multiple GB) of testing, as it turns
out. I only recently optimized them away to "backup" duty.
I meant, unexercised. This will bitrot quickly.
Ok, since I don't need it, should I just fail the cases that do not
qualify to use switch_to() (if any)?
Attachment:
signature.asc
Description: OpenPGP digital signature
- Follow-Ups:
- References:
- [PATCH v2 0/4] KVM: xinterface
- From: Gregory Haskins
- [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- From: Gregory Haskins
- Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- From: Avi Kivity
- Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- From: Gregory Haskins
- Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- From: Avi Kivity
- Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- From: Gregory Haskins
- Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- From: Avi Kivity
- [PATCH v2 0/4] KVM: xinterface
- Prev by Date: Re: [PATCH 1/3 pci#linux-next] pci: determine CLS more intelligently
- Next by Date: Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- Previous by thread: Re: [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- Next by thread: Re: [Alacrityvm-devel] [PATCH v2 2/4] KVM: introduce "xinterface" API for external interaction with guests
- Index(es):
Relevant Pages
|
Loading