[RFC] buddy allocator without bitmap(2) [0/3]

From: Hiroyuki KAMEZAWA (kamezawa.hiroyu_at_jp.fujitsu.com)
Date: 08/31/04

  • Next message: Guillaume Thouvenin: "Re: [Lse-tech] Re: [PATCH] new CSA patchset for 2.6.8"
    Date:	Tue, 31 Aug 2004 19:36:01 +0900
    To: Linux Kernel ML <linux-kernel@vger.kernel.org>
    
    

    Hi,
    thanks for many advices on previous RFC.

    This new patch is against 2.6.9-rc1-mm1.
    A new algorithm is implemented.

    Considering advises on previous patch, I think the problems of that was
    (1) vagueness of zone->aligned_order
    (2) vagueness of zone->nr_mem_map
    (3) using pfn_valid() in the core of buddy allocator only for rare case.
    (ex) readability ;)

    I removed these (1)-(3) and this version is not so micro-optimized.
    I think inner-loop of free_pages_bulk() becomes simpler.

    I added one more PG_xxx flag, PG_buddyend and reserve one page as a stopper.
    PG_buddyend is set in boot-time and never cleared. This flags works as a
    lower-address stopper of buddy allocator.The reserved one page works as a
    higher-address stopper. "No accessing not-existing page" is guaranteed by these two.
    About a usage of PG_buddyend flag, please read patch.
    There is a sample illustration in attached patch.

    Advantage:
      - information about buddy system is fully stored in the mem_map.
      - no bitmap
      - can work well on discontiguous mem_map.
    Disadvantage:
      - using one more PG_xxx flag.
      - If mem_map is not aligned, reserve one page as a victim for buddy allocater.

    How about this approach ?

    Regards
    -- Kame

    This patch removes bitmap from buddy allocator,
    removes free_area_t's bitmap in include/linux/mmzone.h
    and adds some definition in include/linux/mm.h

    This is the 1st patch.

    Currently,Linux's page allocator uses buddy algorithm and codes for buddy
    allocator uses bitmaps. For what is bitmaps are used ?

    (*) for recording "a page is free" and page's order.

    here, page's order means size of contiguous free pages.
    if a free page[x] 's order is Y, there are contiguous free pages
    among page[X] to page[X + 2^(Y) - 1]

    If a page is free and is a head of contiguous free pages of order 'X',
    we can record it by
    set_bit(free_area[X]->map, index of page[X] in this order)

    For coalescing, when there is a chunk of free pages of order 'X',
    we can test whether we can coalesce or not by,
    test_bit(free_aera[X]->bitmap,index_of_buddy)
    index_of_buddy can be calculated by (index_of_page ^ (1 << order))

    This patch removes bitmap and recording a free page's order
    in its page structure's page->private field. If a page is free and
    it is a head of a free contiguous memory chunk, page->private indicates
    the order of the page.PG_private bit is used to show propriety of this field.

    For coalescing, when there is a page which is a chunk of contiguous free pages
    of order 'X', we can test whether the page is to be coalesced or not by
    (page_is_free(buddy) && PagePrivate(buddy) && page_order(buddy) == 'X')
    address of buddy can be calculated by the same way in bitmap case.

    If page is free and on the buddy system, PG_private bit is set
    and has its order in page->private. This scheme is safe because...
    (a) when page is being freed, PG_private is not set. (see free_pages_check())
    (b) when page is free and on the buddy system, PG_private is set.
    These facts are guaranteed by zone->lock.
    Only one thread can change a free page's PG_private bit and private field
    at anytime.

    [ Not MAX_ORDER aligned memory ]
    If all memory are aligned to system's MAX_ORDER, all buddy algorithm works
    fine and there is no trouble.But if memory is not aligned, we must check
    "whether buddy exists or not?".
    Checking this in main-loop of free_pages_bulk() is ugly but a page which
    has no buddy in some order can be calculated in boot time.

    New PG_xxx flag, PG_buddyend flag is introduced.
    This flag works as a stopper of buddy coalescing.

    A page is on the lower address end of mem_map and it cannot have
    its lower address buddy is marked as PG_buddyend. Please see below.

    If mem_map's start address is aligned, pages & buddy looks like this

    order start_pfn -> higher address
             -------------------------------------
    0 | | | | | | | | | | | | | ..
             -------------------------------------
    1 | | | | | | |
             -------------------------------------
    2 | | | |
             -------------------------------------
    3 | |
             -------------------------------------

    If mem_map start address is not aligned, some of "lower address buddy"
    disappears.

                ----------------------------------
    0 |X | | | | | | | | | | | ..
                ----------------------------------
    1 |X | | | | |
                   -------------------------------
    2 |X | |
                         -------------------------
    3 |X
                                     -------------

    A group of pages marked 'X' in each order never has its lower address buddy.
    'X' pages are marked as PG_buddyend.

    Now,this check can work well enough to avoid coalescing not aligned pages.
    -----------------------
    buddy_index = page_index ^ (1 << order)
    if ((buddy_index < page_index) &&
         PageBuddyend(base + page_index))
           stop coalescing.
    -----------------------

    Above is for lower address case, How about higher address ?
    If higher end of mem_map is not aligned to MAX_ORDER, we reserve
    the highest address page and don't put it into buddy system. (don't pass to
    free_pages())
    This one page is a victim for buddy allocator, and this is enough.

    -- Kame

    ---
      linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h         |   24 ++++++++++++++
      linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h     |    1
      linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h |   10 +++++
      3 files changed, 34 insertions(+), 1 deletion(-)
    diff -puN include/linux/mm.h~eliminate-bitmap-includes include/linux/mm.h
    --- linux-2.6.9-rc1-mm1-k/include/linux/mm.h~eliminate-bitmap-includes	2004-08-31 18:37:04.186101664 +0900
    +++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mm.h	2004-08-31 18:37:04.194100448 +0900
    @@ -209,6 +209,9 @@ struct page {
      					 * usually used for buffer_heads
      					 * if PagePrivate set; used for
      					 * swp_entry_t if PageSwapCache
    +					 * When page is free:
    +					 * this indicates order of page
    +					 * in buddy allocator.
      					 */
      	struct address_space *mapping;	/* If low bit clear, points to
      					 * inode address_space, or NULL.
    @@ -322,6 +325,27 @@ static inline void put_page(struct page
      #endif		/* CONFIG_HUGETLB_PAGE */
      /*
    + * These functions are used in alloc_pages()/free_pages(), buddy allocator.
    + * page_order(page) returns an order of a free page in buddy allocator.
    + *
    + * this is used with PG_private flag
    + *
    + * Note : all PG_private operations used in buddy system is done while
    + * zone->lock is acquired. So set and clear PG_private bit operation
    + * does not need to be atomic.
    + */
    +
    +static inline int page_order(struct page *page)
    +{
    +	return page->private;
    +}
    +
    +static inline void set_page_order(struct page *page, int order)
    +{
    +	page->private = order;
    +}
    +
    +/*
       * Multiple processes may "see" the same page. E.g. for untouched
       * mappings of /dev/null, all processes see the same page full of
       * zeroes, and text pages of executables and shared libraries have
    diff -puN include/linux/mmzone.h~eliminate-bitmap-includes include/linux/mmzone.h
    --- linux-2.6.9-rc1-mm1-k/include/linux/mmzone.h~eliminate-bitmap-includes	2004-08-31 18:37:04.188101360 +0900
    +++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/mmzone.h	2004-08-31 18:37:04.195100296 +0900
    @@ -22,7 +22,6 @@
      struct free_area {
      	struct list_head	free_list;
    -	unsigned long		*map;
      };
      struct pglist_data;
    diff -puN include/linux/page-flags.h~eliminate-bitmap-includes include/linux/page-flags.h
    --- linux-2.6.9-rc1-mm1-k/include/linux/page-flags.h~eliminate-bitmap-includes	2004-08-31 18:37:04.190101056 +0900
    +++ linux-2.6.9-rc1-mm1-k-kamezawa/include/linux/page-flags.h	2004-08-31 18:37:04.196100144 +0900
    @@ -44,6 +44,12 @@
       * space, they need to be kmapped separately for doing IO on the pages.  The
       * struct page (these bits with information) are always mapped into kernel
       * address space...
    + *
    + * PG_buddyend pages don't have its buddy in buddy allocator in some meaning.
    + * If a page is on the lower address end of mem_map and a buddy of it, in some
    + * order, is below the end, a page is marked as PG_buddyend. If a page is on
    + * the higher addres end of mem_map and a buddy of it,in some order, is above
    + * the end, a page is marked as PG_buddyend.
       */
      /*
    @@ -75,6 +81,7 @@
      #define PG_mappedtodisk		17	/* Has blocks allocated on-disk */
      #define PG_reclaim		18	/* To be reclaimed asap */
    +#define PG_buddyend             19      /* end of the buddy allocator */
      /*
       * Global page accounting.  One instance per CPU.  Only unsigned longs are
    @@ -290,6 +297,9 @@ extern unsigned long __read_page_state(u
      #define SetPageCompound(page)	set_bit(PG_compound, &(page)->flags)
      #define ClearPageCompound(page)	clear_bit(PG_compound, &(page)->flags)
    +#define PageBuddyend(page)      test_bit(PG_buddyend, &(page)->flags)
    +#define SetPageBuddyend(page)   set_bit(PG_buddyend, &(page)->flags)
    +
      #ifdef CONFIG_SWAP
      #define PageSwapCache(page)	test_bit(PG_swapcache, &(page)->flags)
      #define SetPageSwapCache(page)	set_bit(PG_swapcache, &(page)->flags)
    _
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Guillaume Thouvenin: "Re: [Lse-tech] Re: [PATCH] new CSA patchset for 2.6.8"

    Relevant Pages

    • Re: assert/crash in __rmqueue() when enabling CONFIG_NUMA
      ... first post of the patch. ... I don't remember seeing your .config in the thread (or blind and unable ... While in the buddy allocator before checking for a valid buddy the buddy page ... must reside in the parent's zone too. ...
      (Linux-Kernel)
    • [PATCH] stop using "base" argument in __free_pages_bulk()
      ... This patch is somewhat of a companion to the "no buddy bitmap" patches. ... it has a likely small positive performance impact. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: Same old same old
      ... My old buddy was over here this morning to help me patch up my old shed ... in my back yard.He didn't believe I can pull up some naked wimmins via ...
      (rec.radio.shortwave)
    • Re: [SLE] Has anyone heard about this?
      ... Some say there is a patch available through YOU for that problem. ... If you want to help your buddy, locate the patch description for the ... Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com ...
      (SuSE)
    • [RFC] buddy allocator without bitmap [0/4]
      ... I'm now working for memory-hotplug and my purpose of removing bitmaps ... Current buddy allocator, which uses bitmaps, records free page's order ... Because there is no access to bitmaps, the kernel's memory footprint looks ...
      (Linux-Kernel)