Re: [Lhms-devel] [PATCH 0/7] Fragmentation Avoidance V19

From: Dave Hansen (haveblue_at_us.ibm.com)
Date: 11/01/05

  • Next message: Alan Stern: "Re: Notifier chains are unsafe"
    To: Ingo Molnar <mingo@elte.hu>
    Date:	Tue, 01 Nov 2005 16:22:59 +0100
    
    

    On Tue, 2005-11-01 at 16:01 +0100, Ingo Molnar wrote:
    > so it's all about expectations: _could_ you reasonably remove a piece of
    > RAM? Customer will say: "I have stopped all nonessential services, and
    > free RAM is at 90%, still I cannot remove that piece of faulty RAM, fix
    > the kernel!".

    That's an excellent example. Until we have some kind of kernel
    remapping, breaking the 1:1 kernel virtual mapping, these pages will
    always exist. The easiest example of this kind of memory is kernel
    text.

    Another example might be a somewhat errant device driver which has
    allocates some large buffers and is doing DMA to or from them. In this
    case, we need to have APIs to require devices to give up and reacquire
    any dynamically allocated structures. If the device driver does not
    implement these APIs it is not compatible with memory hotplug.

    > > There is also no precedent in existing UNIXes for a 100% solution.
    >
    > does this have any relevance to the point, other than to prove that it's
    > a hard problem that we should not pretend to be able to solve, without
    > seeing a clear path towards a solution?

    Agreed. It is a hard problem. One that some other UNIXes have not
    fully solved.

    Here are the steps that I think we need to take. Do you see any holes
    in their coverage? Anything that seems infeasible?

    1. Fragmentation avoidance
       * by itself, increases likelyhood of having an area of memory
         which might be easily removed
       * very small (if any) performance overhead
       * other potential in-kernel users
       * creates infrastructure to enforce the "hotplugablity" of any
         particular are of memory.
    2. Driver APIs
       * Require that drivers specifically request for areas which must
         retain constant physical addresses
       * Driver must relinquish control of such areas upon request
       * Can be worked around by hypervisors
    3. Break 1:1 Kernel Virtual/Physial Mapping
       * In any large area of physical memory we wish to remove, there will
         likely be very, very few straggler pages, which can not easily be
         freed.
       * Kernel will transparently move the contents of these physical pages
         to new pages, keeping constant virtual addresses.
       * Negative TLB overhead, as in-kernel large page mappings are broken
         down into smaller pages.
       * __{p,v}a() become more expensive, likely a table lookup

    I've already done (3) on a limited basis, in the early days of memory
    hotplug. Not the remapping, just breaking the 1:1 assumptions. It
    wasn't too horribly painful.

    We'll also need to make some decisions along the way about what to do
    about thinks like large pages. Is it better to just punt like AIX and
    refuse to remove their areas? Break them down into small pages and
    degrade performance?

    -- Dave

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Alan Stern: "Re: Notifier chains are unsafe"

    Relevant Pages

    • Re: 1352 NUL bytes at the end of a page? (was Re: Assertion `s && s->tree failed: The sag
      ... > usbcore: registered new driver usbfs ... (I just pulled that key at random out of the kernel repository; ... That ought to do a nice test of the CPU, memory, disk, and kernel sans ... I don't see how it could be an in-flight corruption. ...
      (Linux-Kernel)
    • Network buffer hang was Re: [PATCH] 2.6 workaround for Athlon/Opteron prefetch errata
      ... >> with a normal memory access. ... These bug doesn't cause kernel hangs, ... > network stacks allocates 4KB buffers to store this little messages. ... This means even when a driver doesn't do the rx_copybreak ...
      (Linux-Kernel)
    • Re: Memory allocation problem with 2.6.22 after suspend/resume cycle
      ... after a suspend and resume cycle, the kernel will try to free ... I do however not agree with Andrew's conclusion, as the memory is not ... Can you try another version of the ATI driver? ... Are you 100% sure that your 32-bit kernel configuration reflects the 64-bit ...
      (Linux-Kernel)
    • memory mapped IO in kernel mode
      ... I'm working on a device driver for a 2.4 kernel and I'm having a heck of a ... And I have a similar driver written for x86 ... The hardware is controlled via memory mapped registers at 0x11e0104 - ... use the handle returned by ioremap for readb and readw... ...
      (comp.os.linux.embedded)
    • Re: 2.6.30-rc4 kernel
      ... I think there may be a problem with the 2.6.30 kernel that is ... # Generic Driver Options ... # PCI IDE chipsets support ... # Other IDE chipsets support ...
      (Linux-Kernel)