Re: missing madvise functionality
- From: Ulrich Drepper <drepper@xxxxxxxxxx>
- Date: Tue, 03 Apr 2007 13:17:09 -0700
Andrew Morton wrote:
Ulrich, could you suggest a little test app which would demonstrate this
behaviour?
It's not really reliably possible to demonstrate this with a small
program using malloc. You'd need something like this mysql test case
which Rik said is not hard to run by yourself.
If somebody adds a kernel interface I can easily produce a glibc patch
so that the test can be run in the new environment.
But it's of course easy enough to simulate the specific problem in a
micro benchmark. If you want that let me know.
Question:
- if an access to a page in the range happens in the future it must
succeed. The old page content can be provided or a new, empty page
can be provided
How important is this "use the old page if it is available" feature? If we
were to simply implement a fast unconditional-free-the-page, so that
subsequent accesses always returned a new, zeroed page, do we expect that
this will be a 90%-good-enough thing, or will it be significantly
inefficient?
My guess is that the page fault you'd get for every single page is a
huge part of the problem. If you don't free the pages and just leave
them in the process processes which quickly reuse the memory pool will
experience no noticeable slowdown. The only difference between not
freeing the memory and and doing it is that one madvise() syscall.
If you unconditionally free the page you we have later mprotect() call
(one mmap_sem lock saved). But does every page fault then later
requires the semaphore? Even if not, the additional kernel entry is a
killer.
So perhaps we can do something like chop swapper_space in half: the lower
50% represent offsets which have a swap mapping and the upper 50% are fake
swapcache pages which don't actually consume swapspace. These pages are
unmapped from pagetables, marked clean, added to the fake part of
swapper_space and are deactivated. Teach the low-level swap code to ignore
the request to free physical swapspace when these pages are released.
Sounds good to me.
This would all halve the maximum amount of swap which can be used. iirc
i386 supports 27 bits of swapcache indexing, and 26 bits is 274GB, which
is hopefully enough..
Boo hoo, poor 32-bit machines. People with demands of > 274G should get
a real machine instead.
--
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖
Attachment:
signature.asc
Description: OpenPGP digital signature
- Follow-Ups:
- Re: missing madvise functionality
- From: Andrew Morton
- Re: missing madvise functionality
- From: Jakub Jelinek
- Re: missing madvise functionality
- References:
- missing madvise functionality
- From: Ulrich Drepper
- Re: missing madvise functionality
- From: Andi Kleen
- Re: missing madvise functionality
- From: Ulrich Drepper
- Re: missing madvise functionality
- From: Andi Kleen
- Re: missing madvise functionality
- From: Andrew Morton
- missing madvise functionality
- Prev by Date: Re: getting processor numbers
- Next by Date: Re: getting processor numbers
- Previous by thread: Re: missing madvise functionality
- Next by thread: Re: missing madvise functionality
- Index(es):