Re: arch/xen is a bad idea

From: Andi Kleen (ak_at_suse.de)
Date: 12/15/04

  • Next message: Udo van den Heuvel: "2.6.10-rc3-mm1 on a EPIA CL-6000 gives weird eth1 messages"
    Date:	Wed, 15 Dec 2004 05:49:27 +0100
    To: Ian Pratt <Ian.Pratt@cl.cam.ac.uk>
    
    

    On Tue, Dec 14, 2004 at 10:40:20PM +0000, Ian Pratt wrote:
    > I really think the best approach is to get arch xen into
    > mainstream Linux, and then work toward integrating i386, x86_64
    > and xen. From our point of view, the first stage of this is to

    I think that's definitely the wrong way. Also in Linux
    the standard strategy is to first clean up and restructure/refactor
    code and then merge, not the other way round.

    > increase the number of files that are shared unmodified between
    > i386 and xen/i386 (i.e. linked from xen into i386). There's
    > already many such files, but with a few relatively simple changes
    > to i386 we could get quite a few more.

    That will still have most of the disadvantages I listed.

    >
    > > Currently it's already difficult enough to get people to
    > > add fixes to both i386 and x86-64, adding fixes to three
    > > or rather four (xen32 and xen64) architectures will be quite bad.
    > > In practice we'll likely get much worse code drift and missing
    > > fixes. Also I still suspect Ian is underestimating how much
    > > work it is long term to keep an Linux architecture uptodate.
    >
    > We're actually very well setup to handle this, having been doing
    > it for some time. Whenever Linus issues a new mainstream patch,
    > we have a script that rewrites the patch to duplicate the hunks
    > that apply to i386 such that they also hit files that we've
    > modified in xen/i386. This way we keep arch xen/i386 in perfect
    > sync with i386. It takes discipline, but we're pretty good at it
    > now.

    That won't anymore at some time. I found this out on x86-64.
    It works at the beginning, but eventually the code drift gets
    so much that it's near impossible to apply any hunk and you
    have to redo everything. One reason for that is that in Linux
    code often gets restructured, which makes it very difficult
    for such mechanized merging procedures to work long term.

    Also you have to review every change anyways.

    > > I cannot imagine the virtualization hooks are intrusive anyways. The
    > > only things it needs to hook idle and the page table updates,
    > > right?
    >
    > It's rather more complicated than that if you want a clean
    > interface that gives good performance. We've taken a very
    > benchmark-driven approach to minimise the overhead of
    > virtualization. The aim is to have such a small overhead that
    > people are happy running virtualized the whole time. I think this
    > is a really important aim.

    Can you please be a bit more precise on that? What exactly do you
    need?

    >
    > > Doing that cleanly in the existing architectures shouldn't be that
    > > hard.
    >
    > I've appended a list of some of the areas we need to modify. I
    > think you may have underestimated what needs to happen.

    Ok that's a start.

    >
    > > I suspect xen64 will be rather different from xen32 anyways
    > > because as far as I can see the tricks Xen32 uses to be
    > > fast (segment limits) just plain don't work on 64bit
    > > because the segments don't extend into 64bit space.
    > > So having both in one architecture may also end up messy.
    >
    > We have subdirectories for the i386 and x86_64 specific files,
    > along with a common directory for stuff which is shared between
    > the two e.g. virtual interrupt control etc.

    Ok, but I think there will be significant differences in the
    64bit part and meshing both together will get extremly messy.

    What's your strategy for example to merge changes from arch/x86_64
    to xen64? I don't think the way you described above will work
    in any way.

    >
    > > Also the other thing I'm worried about is that there is no clear
    > > specification on how the Xen<->Linux interface works.
    >
    > We have an interface manual in the Xen bk repo, but I acknowledge
    > that we haven't always been totally prompt in keeping it up to
    > date and fully detailed. Now the Xen2 interface is frozen, that
    > should be fixable. Even so, it hasn't prevented other independent
    > groups porting OSes to Xen e.g. NetBSD, FreeBSD, Plan9.
    >
    > Ian

    Comments on some issues:

    You are saying you ran benchmarks on each of them and it turns
    out to be too slow to emulate them in the standard architectural
    way? At least for some of them it's hard to believe.

    >
    > Differences between i386 and xen/i386 files:
    > - irq setup

    Already two different ways (MSI and non MSI). Adding a third
    is probably feasible.

    > - pci bus scanning

    We already have at least three different ways to do that, adding
    a fourth wouldn't be very messy I guess.

    > - gdt/ldt must be in dedicated read-only pages

    That can be probably done in the standard kernel
    (without the actual read protection bit)

    > - gdt/ldt install/updates

    gdt load could be just trapped, it only happens once
    at startup.

    LDT i can see some effort needed, but it's already
    only at a single function in a header file.

    > - debug register updates

    This shouldn't be performance critical. Why is it not
    simply emulated by the hypervisor?

    > - pte quicklist/cache

    i386 had a pte quicklist some time ago. I think there were
    plans anyways to readd it.

    > - pmd/pgd are read-only pages

    I suspect this just needs another include layer like the existing
    PAE/non PAE layer.

    > - highmem pte mapping
    > - dma memory allocation
    > - idle loop

    This is already pluggable (e.g. ACPI has its own)

    > - different fixmap
    > - *_to_* macros differ (like page_to_phys)
    > - *_val macros (pte_val, pmd_val)

    Can be also handled like 2/3 levels I bet.

    > - inb/outb

    Why is it not trapped?

    > - switching cr3
    > - cr4 updates
    > - a few cpu flags need to be cleared
    > - msr access

    Why don't you just simply trap it? MSRs are normally
    not performance critical (except on x86-64 for the context switch
    but you didn't seem to have attacked that at all yet ;-)

    > - wbinvd call
    > - mtrr access

    Same.

    For all of these it would seem much cleaner to me to just
    provide instruction emulation in the Hypervisor.

    > - io permission handling

    What is different?

    > - ioremap
    > - access to hardware memory
    > - interrupt enabling/disabling
    > - setup of trap gates
    > - tlb flush

    This is already well separated, shouldn't be a big issue
    to make it different.

    > - fpu stack switches
    > - user/kernel mode test

    That can be just changed everywhere in i386 without ill effects.

    > - different segment selectors since the kernel runs in ring1
    > - pagefault handler stack layout is different
    > - startup is in 32-bit mode

    That's a good cleanup anyways. The early boot interface
    between 16 and 32bit should be clearly defined and then
    a general 32bit booting interface be defined. There are various
    other people who would like to have this too. I had some plans
    to do the equivalent on x86-64 with a 64bit boot interface.

    > - start of day initialization is different
    > - start of day memory probing and pagetable setup

    Sounds similar to EFI.

    > - trap/fault handling
    > - timer is virtualized

    Details?

    First impression is that there is much cleanup potential and that
    there should be probably some discussion on each of these items.

    -Andi
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Udo van den Heuvel: "2.6.10-rc3-mm1 on a EPIA CL-6000 gives weird eth1 messages"

    Relevant Pages

    • Re: arch/xen is a bad idea
      ... > work it is long term to keep an Linux architecture uptodate. ... We're actually very well setup to handle this, ... virtualization. ... access to hardware memory ...
      (Linux-Kernel)
    • Re: A proposal - binary
      ... and having an interface in the kernel is a requirement ... interfaces are worth copying for the x86 case. ... Almost nothing from any other architecture makes sense for x86. ... PPC64 and S390 had the benefit of being designed with virtualization in mind, and they still have "paravirtualized" kernel architectures when you look at the lower layers. ...
      (Linux-Kernel)
    • (no subject)
      ... Subject: arch/xen considered harmful was Re: Xen VMM patch set - take 2 ... work it is long term to keep an Linux architecture uptodate. ... I cannot imagine the virtualization hooks are intrusive anyways. ... specification on how the XenLinux interface works. ...
      (Linux-Kernel)
    • arch/xen is a bad idea
      ... or rather four (xen32 and xen64) architectures will be quite bad. ... work it is long term to keep an Linux architecture uptodate. ... I cannot imagine the virtualization hooks are intrusive anyways. ... specification on how the XenLinux interface works. ...
      (Linux-Kernel)
    • Re: Problem about ppp -nat
      ... ipfw firewall, ... Just setup your fw of choice as if the tun0 device is the external device and leave all the nat stuff completely out of it. ... My Internet interface is rl0, ... # /etc/rc.d/routing restart ...
      (freebsd-questions)