Re: [patch 00/19] VM pageout scalability improvements



On Wed, 2008-01-02 at 17:41 -0500, linux-kernel@xxxxxxxxxxxxxxx wrote:
On large memory systems, the VM can spend way too much time scanning
through pages that it cannot (or should not) evict from memory. Not
only does it use up CPU time, but it also provokes lock contention
and can leave large systems under memory presure in a catatonic state.

Against 2.6.24-rc6-mm1

This patch series improves VM scalability by:

1) making the locking a little more scalable

2) putting filesystem backed, swap backed and non-reclaimable pages
onto their own LRUs, so the system only scans the pages that it
can/should evict from memory

3) switching to SEQ replacement for the anonymous LRUs, so the
number of pages that need to be scanned when the system
starts swapping is bound to a reasonable number

The noreclaim patches come verbatim from Lee Schermerhorn and
Nick Piggin. I have made a few small fixes to them and left out
the bits that are no longer needed with split file/anon lists.

The exception is "Scan noreclaim list for reclaimable pages",
which should not be needed but could be a useful debugging tool.

Note that patch 14/19 [SHM_LOCK/UNLOCK handling] depends on the
infrastructure introduced by the "Scan noreclaim list for reclaimable
pages" patch. When SHM_UNLOCKing a shm segment, we call a new
scan_mapping_noreclaim_page() function to check all of the pages in the
segment for reclaimability. There might be other reasons for the pages
to be non-reclaimable...

So, we can't merge 14/19 as is w/o some of patch 12. We can probably
eliminate the sysctl and per node sysfs attributes to force a scan.
But, as Rik says, this has been useful for debugging--e.g., periodically
forcing a full rescan while running a stress load.

Also, I should point out that the full noreclaim series includes a
couple of other patches NOT posted here by Rik:

1) treat swap backed pages as nonreclaimable when no swap space is
available. This addresses a problem we've seen in real life, with
vmscan spending a lot of time trying to reclaim anon/shmem/tmpfs/...
pages only to find that there is no swap space--add_to_swap() fails.
Maybe not a problem with Rik's new anon page handling. We'll see. If
we did want to add this filter, we'll need a way to bring back pages
from the noreclaim list that are there only for lack of swap space when
space is added or becomes available.

2) treat anon pages with "excessively long" anon_vma lists as
nonreclaimable. "excessively long" here is a sysctl tunable parameter.
This also addresses problems we've seen with benchmarks and stress
tests--all cpus spinning on some anon_vma lock. In "real life", we've
seen this behavior with file backed pages--spinning on the
i_mmap_lock--running Oracle workloads with user counts in the few
thousands. Again, something we may not need with Rik's vmscan rework.
If we did want to do this, we'd probably want to address file backed
pages and add support to bring the pages back from the noreclaim list
when the number of "mappers" drops below the threshold. My current
patch leaves anon pages as non-reclaimable until they're freed, or
manually scanned via the mechanism introduced by patch 12.

Lee


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: remove zero_page (was Re: -mm merge plans for 2.6.24)
    ... then I'd certainly accept the patch. ... (basically -- if the app cares about memory or cache footprint and is using ... And indeed this cacheline bouncing has shown up on large SGI systems. ... Inserting a ZERO_PAGE for anonymous read faults appears to be a false ...
    (Linux-Kernel)
  • =?iso-8859-15?Q?Re:_[RFC]_BadRAM_still_not_ready_for_inclusion_=3F_(wa?= =?iso-8859-
    ... maybe this patch is just something very special, having many pro's but also con's - so this also could be one reason why it exists for so long outside mainline. ... BadRAM let's you tell the kernel to skip certain regions of ram, ... forever, once it becomes a supported feature, for the benefit of the few ... people who can't or won't replace bad memory. ...
    (Linux-Kernel)
  • [RFC PATCH 0/4]: affinity-on-next-touch
    ... I wrote a patch to support the adaptive data distribution strategy ... certain region of its virtual memory space. ... memory area from read and write accesses and install a signal handler to ... Afterwards, the signal handler clears the page protection ...
    (Linux-Kernel)
  • Re: + powerpc-fix-code-for-reserved-memory-spanning-across-nodes.pat
    ... The patch titled ... Subject: powerpc: fix code for reserved memory spanning across nodes ... memory reservation spans across these two nodes. ... The above patch brings a kludge back to bootmem that silently cuts ...
    (Linux-Kernel)
  • Re: [patch 02/11] PAT x86: Map only usable memory in x86_64 identity map and kernel text
    ... The main thing required is on the lines of Jesse's patch. ... 32-bit: trim memory not covered by wb mtrrs ... On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all ...
    (Linux-Kernel)