Re: [PATCH 00/28] Swap over NFS -v16



On Friday March 7, a.p.zijlstra@xxxxxxxxx wrote:
Hi Neil,

I'm so glad you are working with me on this and writing this in human
English. It seems to be my eternal short-comming to communicate my ideas
clearly :-/. Thanks for your effort!

:-)
It always helps to have a second brain with a different perspective.



On Fri, 2008-03-07 at 14:33 +1100, Neil Brown wrote:

[I don't find the above wholly satisfying. There seems to be too much
hand-waving. If someone can provide better text explaining why
swapout is a special case, that would be great.]

Anonymous pages are dirty by definition (except the zero page, but I
think we recently ditched it). So shrinking of the anonymous pool will
require swapping.

Well, there is the swap cache. That's probably what I was thinking of
when I said "clean anonymous pages". I suspect they are the first to
go!


It is indeed the last refuge for those with GFP_NOFS. Allong with the
strict limit on the amount of dirty file pages it also ensures writing
those out will never deadlock the machine as there are always clean file
pages and or anonymous pages to launder.

The difficulty I have is justifying exactly why page-cache writeout
will not deadlock. What if all the memory that is not dirty-pagecache
is anonymous, and if swap isn't enabled?
Maybe the number returned by "determine_dirtyable_memory" in
page-writeback.c excludes anonymous pages? I wonder if the meaning of
NR_FREE_PAGES, NR_INACTIVE, etc is documented anywhere....

...

Right. I've had a long conversation on PG_emergency with Pekka. And I
think the conclusion was that PG_emergency will create more head-aches
than it solves. I probably have the conversation in my IRC logs and
could email it if you're interested (and Pekka doesn't object).

Maybe that depends on the exact semantic of PG_emergency ??
I remember you being concerned that PG_emergency never changes between
allocation and freeing, and that wouldn't work well with slub.
My envisioned semantic has it possibly changing quite often.
What it means is:
The last allocation done from this page was in a low-memory
condition.

You really need some way to tell if the result of kmalloc/kmemalloc
should be treated as reserved.
I think you had code which first tried the allocation without
GFP_MEMALLOC and then if that failed, tried again *with*
GFP_MEMALLOC. If that then succeeded, it is assumed to be an
allocation from reserves. That seemed rather ugly, though I guess you
could wrap it in a function to hide the ugliness:

void *kmalloc_reserve(size_t size, int *reserve, gfp_t gfp_flags)
{
void *result = kmalloc(size, gfp_flags & ~GFP_MEMALLOC);
if (result) {
*reserve = 0;
return result;
}
result = kmalloc(size, gfp_flags | GFP_MEMALLOC);
if (result) {
*reserve = 1;
return result;
}
return NULL;
}
???


I've already heard interest from other people to use these hooks to
provide swap on other non-block filesystems such as jffs2, logfs and the
like.

I'm interested in the swap_in/swap_out interface for external
write-intent bitmaps for md/raid arrays.
You can have a write-intent bitmap which records which blocks might be
dirty if the host crashes, so that resync is much faster.
It can be stored in a file in a separate filesystem, but that is
currently implemented by using bmap to enumerate the blocks and then
reading/writing directly to the device (like swap). Your interface
would be much nicer for that (not that I think having a
write-intent-bitmap on an NFS filesystem would be a clever idea ;-)

I'll look forward to your next patch set....

One thing I had thought odd while reading the patches, but haven't
found an opportunity to mention before, is the "IS_SWAPFILE" test in
nfs-swapper.patch.
This seems like a layering violation. It would be better if the test
was based on whether ->swapfile had been called on the file. That way
my write-intent-bitmaps would get the same benefit.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [Bug #14141] order 2 page allocation failures in iwlagn
    ... I just got the allocation failures ... by mistake I then started a _second_ gitk ... The system has 2GB RAM + 2GB swap, so IIUC there is no danger of getting ... kcryptd: page allocation failure. ...
    (Linux-Kernel)
  • Re: Problem: Out of memory after 2days with 2GB RAM
    ... I'm attaching a trace where my machine has got into big troubles after ... day usage and several successful suspend/resumes (this seems to be ... allocation on x86_64 is in the /mm directory. ... Swap cache: add 1441, delete 1373, find 22/33 ...
    (Linux-Kernel)
  • Re: New first time install
    ... And Windows has no hope reading or writing to Linux file ... > how you use the system, so it's hard to tell how much swap you will need. ... > I would use at least 256 MB, and up to twice the physical RAM if I thought ...
    (alt.os.linux.suse)
  • Re: [PATCH 00/28] Swap over NFS -v16
    ... Anonymous pages are dirty by definition (except the zero page, ... there is the swap cache. ... NR_FREE_PAGES are the pages in the page allocators free lists. ... allocation and freeing, and that wouldn't work well with slub. ...
    (Linux-Kernel)
  • Re: VM Problems in 2.6.7 (Too active OOM Killer)
    ... > If the kernel has no swap there is nothing it can do with an anonymous page ... was still a lot of cache memory available. ... As I understand in my case with 4G there is Normal zone and HighMem ... > further GFP_KERNEL allocation attempts will go oom. ...
    (Linux-Kernel)