Re: [PATCH] 2/2 Prezeroing large blocks of pages during allocation

From: Mel Gorman (mel_at_csn.ul.ie)
Date: 03/07/05

  • Next message: Andrew Morton: "Re: [patch 07/12] Re: radio-sf16fmi cleanup"
    Date:	Mon, 7 Mar 2005 00:35:56 +0000 (GMT)
    To: Christoph Lameter <christoph@lameter.com>
    
    

    On Mon, 28 Feb 2005, Christoph Lameter wrote:

    > On Sun, 27 Feb 2005, Mel Gorman wrote:
    >
    > > The patch also counts how many blocks of each order were zeroed. This gives
    > > a rough indicator if large blocks are frequently zeroed or not. I found
    > > that order-0 are the most frequent zeroed block because of the per-cpu
    > > caches. This means we rarely win with zeroing in the allocator but the
    > > accounting mechanisms are still handy for the scrubber daemon.
    >
    > Thanks for your efforts in integrating zeroing into your patches to reduce
    > fragmentation.

    No problem.

    > It is true that you do not win with zeroing pages in the
    > allocator. However, you may avoid additional zeroing by zeroing higher
    > order pages and then breaking them into lower order pages (but this will
    > then lead to additional fragmentation).
    >

    I got around the fragmentation problem by having a userzero and kernzero
    pool. I also taught rmqueue_bulk() to allocate memory in as large as
    chunks as possible and break it up into appropriate sizes. This means that
    when the per-cpu caches are allocating 16 pages, we can now allocate this
    as one 2**4 allocation rather than 16 2**0 allocations (which is possibly
    a win in general, not just the prezeroing case, but I have not measured
    it). The zeroblock counts after a stress test now look something like
    this;

    Zeroblock count 96968 145994 125787 75329 32553 110 11 73 26 5 175

    That is a big improvement as we are not zeroing order-0 pages nearly as
    often as we were. It is no longer regressing in terms of fragmentation
    either which is important. I need to test the patch more but hope to post
    a new version by tomorrow evening. It will also need careful reviewing to
    make sure I did not miss something with the per-cpu caches.

    > > This patch seriously regresses how well fragmentation is handled making it
    > > perform almost as badly as the standard allocator. It is because the fallback
    > > ordering for USERZERO has a tendency to clobber the reserved lists because
    > > of the mix of allocation types that need to be zeroed.
    >
    > Having pages of multiple orders in zeroed and not zeroed state invariably
    > leads to more fragmentation. I have also observed that with my patches
    > under many configurations. Seems that the only solution is to
    > intentionally either zero all free pages (which means you can coalesce
    > them all but you are zeroing lots of pages that did not need zeroing
    > after all) or you disregard the zeroed state during coalescing, either
    > insure that both are zeroed or mark the results as unzeroed... both
    > solutions introduce additional overhead.
    >

    My approach is to ignore zero pages during free/coalescing and to treat
    kernel allocations for zero pages differently to userspace allocations for
    zero pages.

    > My favorite solution has been so far to try to zero all
    > pages from the highest order downward but only when the system is idle
    > (or there is some hardware that does zeroing for us). And maybe we better
    > drop the zeroed status if a zeroed and an unzeroed page can be coalesced
    > to a higher order page? However, there will still be lots of unnecessary
    > zeroing.
    >

    When splitting, I zero the largest possible block on the assumption it
    makes sense to zero larger blocks. During coalesing, I ignore the zero
    state altogether as I could not think of a fast way of determining if a
    page was zeroed or not.

    > Since most of the request for zeroed pages are order-0 requests, we could
    > do a similar thing to that M$ Windows does
    > (http://www.windowsitpro.com/Articles/Index.cfm?ArticleID=3774&pg=2): Keep
    > a list of zeroed order 0 pages around, only put things on that list if
    > the system is truly idle and pick pages up for order 0 zeroed accesses.
    >
    > These zero lists would needed to be managed more like cpu hotlists and
    > not like we do currently as buddy allocator freelists.
    >

    I went with a variation of this approach. In my latest tree, I have
    pageset and pageset_zero to represent a per-CPU cache of normal pages and
    of zeroed pages. I have the buddy free lists to zero pages that are only
    filled when splitting up a large block of pages for a zero page
    allocation.

    -- 
    Mel Gorman
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Andrew Morton: "Re: [patch 07/12] Re: radio-sf16fmi cleanup"

    Relevant Pages

    • Re: [PATCH] 2/2 Prezeroing large blocks of pages during allocation
      ... This means we rarely win with zeroing in the allocator but the ... Thanks for your efforts in integrating zeroing into your patches to reduce ... > ordering for USERZERO has a tendency to clobber the reserved lists because ... intentionally either zero all free pages (which means you can coalesce ...
      (Linux-Kernel)
    • Re: [QUICKLIST 0/4] Arch independent quicklists V2
      ... Page table pages have the characteristics that they are typically zero ... allocator to satisfy the next __GFP_ZERO request. ... Prezeroing does not make much sense if a large portion of the ... See lkml archives for prezeroing. ...
      (Linux-Kernel)
    • Re: [QUICKLIST 0/4] Arch independent quicklists V2
      ... Page table pages have the characteristics that they are typically zero ... allocator to satisfy the next __GFP_ZERO request. ... Nope that wont work. ... (That work will open the path to using the idle thread to prezero pages) ...
      (Linux-Kernel)
    • Re: Prezeroing V2 [0/3]: Why and When it works
      ... > memory that it actually matters if you clear the memory just occasionally. ... So far the impact of zeroing is quite minimal ... The CPU can do a couple of Gigs of zeroing per second per CPU and the ... take a fraction of a second to zero all RAM. ...
      (Linux-Kernel)
    • Re: PG_zero
      ... > I think it's much better to have PG_zero in the main page allocator than ... > generate zero pages and allocate them later efficiently. ... There are quite a few patches happening in this area - the ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)