Re: [RFC] memory defragmentation to satisfy high order allocations

From: Marcelo Tosatti (marcelo.tosatti_at_cyclades.com)
Date: 10/02/04

  • Next message: Rafael J. Wysocki: "Re: swsusp: fix x86-64 suspend in -rc3"
    Date:	Sat, 2 Oct 2004 15:33:49 -0300
    To: Hirokazu Takahashi <taka@valinux.co.jp>
    
    

    On Sat, Oct 02, 2004 at 06:30:15PM +0900, Hirokazu Takahashi wrote:
    > Hello, Marcelo.
    >
    > Generic memory defragmentation will be very nice for me to implement
    > hugetlbpage migration, as allocating a new hugetlbpage is a hard job.
    >
    > > For the "defragmentation" operation we want to do an "easy" try - ie if we
    > > can't remap giveup.
    > >
    > > I feel we should try to "untie" the code which checks for remapping availability /
    > > does the remapping from the page migration - so to be able to share the most
    > > code between it and other users of the same functionality.
    >
    > I think it's possible to introduce non-wait mode to the migration code,
    > as you may expect. Shall I implement it?
    >
    > > Curiosity: How did you guys test the migration operation? Several threads on
    > > several processors operating on the memory, etc?
    >
    > I always test it with the zone hotplug emulation patch, which Mr.Iwamoto
    > has made. I usually run following jobs concurrently while zones are added
    > and removed repeatedly on a SMP machine.
    > - making linux kernel
    > - copying file trees.
    > - overwriting file trees.
    > - removing file trees
    > - some pages are swapped out automatically:)
    >
    > And Mr.Iwamoto has some small programs to check any kind of page
    > can be migrated. The programs repeat one of following actions:
    > - read/write files .
    > - use MAP_SHARED and MAP_PRIVATE mmap()'s and read/write there.
    > - use Direct I/O.
    > - use AIO.
    > - fork to have COW pages.
    > - use shmem.
    > - use sendfile.
    >
    > > Cool. I'll take a closer look at the relevant parts of memory hotplug patches
    > > this weekend, hopefully. See if I can help with testing of these patches too.
    >
    > Any comments are very welcome.

    I have a few comments about the code:

    1)
    I'm pretty sure you should transfer the radix tree tag at radix_tree_replace().
    If for example you transfer a dirty tagged page to another zone, an mpage_writepages()
    will miss it (because it uses pagevec_lookup_tag(PAGECACHE_DIRTY_TAG)).

    Should be quite trivial to do (save tags before deleting and set to new entry,
    all in radix_tree_replace).

    My implementation also contained the same bug.

    2)
    At migrate_onepage you add anonymous pages which aren't swap allocated
    to the swap cache
    + /*
    + * Put the page in a radix tree if it isn't in the tree yet.
    + */
    +#ifdef CONFIG_SWAP
    + if (PageAnon(page) && !PageSwapCache(page))
    + if (!add_to_swap(page, GFP_KERNEL)) {
    + unlock_page(page);
    + return ERR_PTR(-ENOSPC);
    + }
    +#endif /* CONFIG_SWAP */

    Why's that? You can copy anonymous pages without adding them to swap (thats
    what the patch I posted does).

    3) At migrate_page_common you assume additional page references
    (page_migratable returning -EAGAIN) means the code should try to writeout
    the page.

    Is that assumption always valid?

    In theory there is no need to writeout pages when migrating them to
    other zones - they will be copied and the dirty information retained (either
    in the PageDirty bit or radix tree tag).

    I just noticed you do that on further patches (migrate_page_buffer), but AFAICS
    the writeout remains. Why arent you using migrate_page_buffer yet?

    I think the final aim should be to remove the need for "pageout()"
    completly.

    4)
    About implementing a nonblocking version of it. The easier way, it
    seems to me, is to pass a "block" argument to generic_migrate_page() and
    use that.

    Questions: are there any documents on the memory hotplug userspace tools?
    Where can I find them?

    Are Iwamoto's test programs available?

    In general the code looks nice to me! I'll jump in and help with
    testing.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Rafael J. Wysocki: "Re: swsusp: fix x86-64 suspend in -rc3"

    Relevant Pages

    • Migrate pages from a ccNUMA node to another
      ... We are left with Non Uniform Memory Architectures. ... You can make use of the forthcoming NUMA APIs to set up your NUMA environment: ... (e.g. it is a reference benchmark) ... Page migration tries to help you out in these situations. ...
      (Linux-Kernel)
    • Re: [RFC][mmotm] Documentation update
      ... Memory Resource ControllerImplementation Memo. ... -patterns tend to be racy. ... charge against oldpage or newpage will be committed. ... Page Migration ...
      (Linux-Kernel)
    • Re: [Xen-devel] Re: [dm-devel] Re: dm-ioband + bio-cgroup benchmarks
      ... provide functionality to associate a bio to a cgroup. ... wouldn't be simpler to just use the memory controller ... for bio-cgroup, ... It depends on the migration. ...
      (Linux-Kernel)
    • Re: [RFC 2.6.11-rc2-mm2 7/7] mm: manual page migration -- sys_page_migrate
      ... Not necessarily all nodes for the rotation, but if you have no free nodes ... system may only have the nodes from the old and new jobs to work with. ... and get any foolhardy migration possible. ... which has 16 processes which are doing work on 16 blocks of memory. ...
      (Linux-Kernel)