Re: A set of "standard" virtual devices?



On Tuesday 03 April 2007, Jeremy Fitzhardinge wrote:
Arnd Bergmann wrote:
I think we need to separate two problems here:

1. Probing:
That's really what triggered the discussion, PCI probing is well-understood
and implemented on _most_ platforms, so there is some value in reusing it.
When you talk about 'very simple probing', I'm not sure what the most simple
approach could be.

Is probing an interesting problem to consider on its own? If there's
some hypervisor-agnostic device driver in Linux, then obviously it needs
some way to find the the corresponding (virtual) hardware for it to talk
to. But that probing mechanism will depend on the actual interface
structure, and is just one of the many problems that need to be solved.
There's no point in overloading PCI to probe for the device unless
you're actually using PCI to talk to the device.

We already have device drivers for physical devices that can be attached
to different buses. The EHCI USB is an example of a driver that can
be for instance PCI, OF or an on-chip device. Moreover, you can have an
abstracted device behind it that does not need to know about the transport,
like the SCSI disk driver does not care if it is talking to an ATA,
parallel SCSI or SAS chip, or even which controller that is.

Let me say up front that I'm skeptical that we can come up with a single
bus-like abstraction which can be a both simple and efficient interface
to all the virtual architectures. I think a more fruitful path is to
find what pieces of functionality can be made common, with the aim of
having small, simple and self-contained hypervisor-specific backends.

I think this needs to be considered on a class by class basis. This
thread started with a discussion about entropy sources. In theory you
could implement it as simply as exposing a mmaped ringbuffer. There are
some extra complexities deriving from the security requirements though;
for example, all the entropy needs to be kept strictly private to the
domain that consumes it.

But beyond that, there are 3 other important classes of device:

* console
* disk
* networking

(There are obviously more, but these are the must-have.)

Console already provides us with a model to work on, in the form of
hvc-console. The hvc-console code itself has the bulk of the common
console code, along with a set of very small hypervisor-specific
backends. The Xen console implementation shrunk considerably when we
switched to using it.

console is also the least problematic interface, you can do it over
practically anything.

If we could do the same thing with disk and net, I would be very happy.

For example, if we wanted to change the Xen frontend/backend disk
interface, we could use SCSI as the basic protocol, and then convert
netfront into a relatively simple scsi driver. There would still be a
Xen-specific piece, but it should be fairly small and have a clean
interface. Though the existing interface is pretty simple
shove-this-block-there affair.

Doing a SCSI driver has been tried before, with ibmvscsi. Not good.
The interesting question about block devices is how to handle concurrency
and interrupt mitigation. An efficient interface should

- have asynchronous notification, not sleep until the transfer is complete
- allow multiple blocks to be in flight simultaneously, so the host can
reorder the requests if it is smart enough
- give only a single interrupt when multiple transfers have completed

minor optimizations could be
- give an interrupt early when some transfers are complete
- allow I/O barriers to be inserted in the stream
- allow marking blocks as more or less important (readahead vs. read)
- provide passthrough of SG_IO or similar for optical media
(e.g. DVD writer)

I'm not sure what similar common code could be extracted for network
devices. I haven't looked into it all that closely.

One way to do networking would be to simply provide a shared memory area
that everyone can write to, then use a ring buffer and atomic operations
to synchronize between the guests, and a method to send interrupts to the
others for flow control.

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: A set of "standard" virtual devices?
    ... That's really what triggered the discussion, PCI probing is well-understood ... But that probing mechanism will depend on the actual interface ... ask the hypervisor for an unused device of a given class, ...
    (Linux-Kernel)
  • Re: [PATCH 0/2] PCI-X/PCI-Express read control interfaces
    ... Instead of letting every driver to directly read/write to PCI ... config space for that, an interface is provided. ... AMD Saxony Limited Liability Company & Co. KG ...
    (Linux-Kernel)
  • Network Problems (NETDEV WATCHDOG: eth2: transmit timed out)
    ... The problem always occurs on the eth interface that is connected to my ... # PCI Hotplug Support ... # Generic Driver Options ... # Non-8250 serial port support ...
    (Linux-Kernel)
  • Re: [PATCH 0/2] PCI-X/PCI-Express read control interfaces
    ... Instead of letting every driver to directly read/write to PCI ... config space for that, an interface is provided. ... At least for the qla2xxx case, the patch could easily distill down from: ...
    (Linux-Kernel)
  • Re: [PATCH 0/2] PCI-X/PCI-Express read control interfaces
    ... Instead of letting every driver to directly read/write to PCI config space for that, an interface is provided. ... The interface functions then can be used for quirks since some PCI bridges require that read byte count values are set by the BIOS and left unchanged by device drivers. ... Once the base infrastructure is in mainline, ...
    (Linux-Kernel)