Re: [PATCH 00/33] Swap over NFS -v14



<apologies for being insanely late into this thread>

On Wed, Oct 31, 2007 at 01:56:53PM +0100, Peter Zijlstra wrote:
On Wed, 2007-10-31 at 08:16 -0400, Jeff Garzik wrote:
Thoughts:
1) I absolutely agree that NFS is far more prominent and useful than any
network block device, at the present time.

2) Nonetheless, swap over NFS is a pretty rare case. I view this work
as interesting, but I really don't see a huge need, for swapping over
NBD or swapping over NFS. I tend to think swapping to a remote resource
starts to approach "migration" rather than merely swapping. Yes, we can
do it... but given the lack of burning need one must examine the price.

There is a large corporate demand for this, which is why I'm doing this.

The typical usage scenarios are:
- cluster/blades, where having local disks is a cost issue (maintenance
of failures, heat, etc)

HPC clusters are increasingly diskless, especially at the high end.
for all the reasons you mention, but also because networks are faster
than disks.

But please, people who want this (I'm sure some of you are reading) do
speak up. I'm just the motivated corporate drone implementing the
feature :-)

swap to iSCSI has worked well in the past with your anti-deadlock
patches, and I'd definitely like to see that continue and to be merged
into mainline!! swap-to-network is a highly desirable feature for
modern clusters.

performance and scalability of NFS is poor, so it's not a good option.

actually swap to a file on Lustre(*) would be best, but iSER and iSCSI
would be my next choices. iSER is better than iSCSI as it's ~5x faster
in practice, and InfiniBand seems to be here to stay.

hmmm - any idea what the issues are with RDMA in low memory situations?
presumably if DMA regions are mapped early then there's not actually
much of a problem? I might try it with tgtd's iSER...

cheers,
robin

(*) obviously not your responsibility. although Lustre (Sun/CFS) could
presumably use your infrastructure once you have it in mainline.


3) You note
Swap over network has the problem that the network subsystem does not use fixed
sized allocations, but heavily relies on kmalloc(). This makes mempools
unusable.

True, but IMO there are mitigating factors that should be researched and
taken into account:

a) To give you some net driver background/history, most mainstream net
drivers were coded to allocate RX skbs of size 1538, under the theory
that they would all be allocating out of the same underlying slab cache.
It would not be difficult to update a great many of the [non-jumbo]
cases to create a fixed size allocation pattern.

One issue that comes to mind is how to ensure we'd still overflow the
IP-reassembly buffers. Currently those are managed on the number of
bytes present, not the number of fragments.

One of the goals of my approach was to not rewrite the network subsystem
to accomodate this feature (and I hope I succeeded).

b) Spare-time experiments and anecdotal evidence points to RX and TX skb
recycling as a potentially valuable area of research. If you are able
to do something like that, then memory suddenly becomes a lot more
bounded and predictable.


So my gut feeling is that taking a hard look at how net drivers function
in the field should give you a lot of good ideas that approach the
shared goal of making network memory allocations more predictable and
bounded.

Note that being bounded only comes from dropping most packets before
trying them to a socket. That is the crucial part of the RX path, to
receive all packets from the NIC (regardless their size) but to not pass
them on to the network stack - unless they belong to a 'special' socket
that promises undelayed processing.

Thanks for these ideas, I'll look into them.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 00/33] Swap over NFS -v14
    ... I absolutely agree that NFS is far more prominent and useful than any network block device, ... I view this work as interesting, but I really don't see a huge need, for swapping over NBD or swapping over NFS. ... So my gut feeling is that taking a hard look at how net drivers function in the field should give you a lot of good ideas that approach the shared goal of making network memory allocations more predictable and bounded. ...
    (Linux-Kernel)
  • Re: [PATCH 00/33] Swap over NFS -v14
    ... network block device, at the present time. ... swap over NFS is a pretty rare case. ... NBD or swapping over NFS. ... where dumping the storage on a networked storage unit ...
    (Linux-Kernel)
  • Re: [PATCH 00/33] Swap over NFS -v14
    ... swap over NFS is a pretty rare case. ... NBD or swapping over NFS. ... sized allocations, ... One of the goals of my approach was to not rewrite the network subsystem ...
    (Linux-Kernel)
  • Re: Using multiple NICs
    ... one with the NFS server and one without the NFS server and a ... Lets say your NFS server is 192.168.1.100 and is thus on network ... You connect the machine with an IP of 192.168.2.199 to switch B ... two NICs, then what you want to do is bonding. ...
    (comp.os.linux.misc)
  • Re: secure nfs alternatives
    ... > for nfs, i wonder what else exists out there that has a better security ... your NFS network and become any user but not the root user on the server. ... AFS, and specifically, OpenAFS is the way to go. ... Other alternatives are Intermezzo, ARL, Coda. ...
    (comp.os.linux.security)