Re: PG_zero

From: Andrea Arcangeli (andrea_at_novell.com)
Date: 11/03/04

  • Next message: Martin J. Bligh: "Re: [PATCH] Use MPOL_INTERLEAVE for tmpfs files"
    Date:	Wed, 3 Nov 2004 02:26:07 +0100
    To: "Martin J. Bligh" <mbligh@aracnet.com>
    
    

    On Tue, Nov 02, 2004 at 02:41:12PM -0800, Martin J. Bligh wrote:
    > it's really one list. However, I'd like to stop the cold list stealing
    > from the hot, so I think it's easier to *implement* it as two lists,
    > because there's nothing in the standard list code to add a magic marker
    > on one element in the middle (though maybe you can think of a trick way

    sure if we want to stop cold allocations to get hot pages we need two
    lists.

    > Yes, it's certainly faster to do that one operation - no dispute from me.
    > However, the question is whether it's worth trading off against the
    > cache-warmth of pages ... that's why we wrote it that way originally, to
    > preserve that.

    my argument is very simple. A random content cold page is worthless.

    When somebody allocates a free page, it will either:

    1) write to it with the cpu to write some useful data into it, so the
    page has a value (not the case since it used __GFP_COLD and we assume it
    was not using __GFP_COLD by mistake)
    2) use DMA to write something useful into the page

    Now after 2) unless it's readahead, it will also _read_ or modify the
    contents of the page. And writing into a _cold_ page with the cpu, isn't
    very different from doing DMA into an hot page. At least this is my
    theory, I'm not an hardware designer but there should be some cache
    snooping during the invalidates that shouldn't be very different from
    the cache recycling. Note that we've no clue if the cache is hot in the
    other cpus as well, this is all per-cpu stuff.

    Probably the only way to know if retaining the separated hot/cold list
    is worthwhile is benchmarking.

    I'm pretty sure __GFP_COLD makes huge difference for the freeing of ram,
    but I doubt it does for the allocation of ram. Everybody who is
    allocating memory is going to write to it and touch it with the cpu
    eventually.

    While freed ram may remain cold forever, so to me __GFP_COLD looks more
    a freeing-ram allocation.

    There's not such thing as allocating ram and never going to touch it
    with the CPU anytime soon.

    > Probably easier to make it a per-arch option to define, but yes, if one
    > arch works better one way, and others the other, we can make it optional
    > fairly easily in lots of ways. <insert traditional disagreement with
    > "new" vs "old" connotations here ...>

    yep ;).

    > Well, remember that the lists only contain a very small amount of memory
    > relative to the machine size anyway, so I'm not really worried about us
    > triggering early OOMs or whatever.

    I'm a bit worried for the many cpu systems. Since I'm also not limiting
    the per-cpu size in function of the total ram of the machine, one could
    have a configuration that doesn't even boot if it has too many cpus,
    probably thousand of them (I doubt any system like that exists, it's
    only a theoretical issue). But the closer you get to the non-booting
    system, the more likely allocations will fail with oom due the
    reservation.

    especially order > 0 allocations will fail way very soon, but for those
    dropping the reservation won't help, so they're an unfixable thing,
    that asks to use the per-cpu resources as efficiently as possible.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Martin J. Bligh: "Re: [PATCH] Use MPOL_INTERLEAVE for tmpfs files"

    Relevant Pages

    • Re: very strange pthread problem
      ... The worker threads need access to a class member function or two. ... STL vector so each index into the vector points to the list of pointers ... it's just storing the pointers to the STL lists and the vector ... Then I moved the whole thing to a single cpu Linux ...
      (comp.programming.threads)
    • SCSI bus reset with Adaptec 29320ALP and Eonstor RAID
      ... I am trying to use a 1.5TB Eonstor raid array with FreeBSD 7.0, but I don't understand whether it is the raid or the scsi card or something else that is causing the computer problems when accessing the raid. ... CPU: IntelXeonCPU 3.20GHz ... Kernel Free SCB lists: ... Sequencer Complete DMA-inprog list: ...
      (freebsd-stable)
    • Re: very strange pthread problem (solved)
      ... The worker threads need access to a class member function or two. ... STL vector so each index into the vector points to the list of pointers ... it's just storing the pointers to the STL lists and the vector ... Then I moved the whole thing to a single cpu Linux ...
      (comp.programming.threads)
    • Re: PG_zero
      ... Other freeing is assumed cache hot and LRU ordered on the hot ... but I think you want the cold list for page ... > reclaim, don't you? ... the current per-cpu lists and those are all fixed, ...
      (Linux-Kernel)
    • Re: Process arbitrary #lists (continued)
      ... see how lists was defined. ... deallocate for it. ... I dont think the standard says it, ... Likewise these allocations exist until the program terminates thats why ...
      (comp.lang.pl1)