Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Evgeniy Polyakov <johnpol@xxxxxxxxxxx>
- Date: Fri, 18 Aug 2006 11:16:06 +0400
On Thu, Aug 17, 2006 at 04:24:26PM -0700, Daniel Phillips (phillips@xxxxxxxxxx) wrote:
Feel free to implement any receiving policy inside _separated_ allocator
to meet your needs, but if allocator depends on main system's memory
conditions it is always possible that it will fail to make forward
progress.
Wrong. Our main allocator has a special reserve that can be accessed
only by a task that has its PF_MEMALLOC flag set. This reserve exists
in order to guarantee forward progress in just such situations as the
network runs into when it is trying to receive responses from a remote
disk. Anything otherwise is a bug.
Ok, I see your point.
You create special fix for special config situation.
In general it does not work.
I do not argue that your approach is bad or does not solve the problem,
I'm just trying to show that further evolution of that idea eventually
ends up in separated allocator (as long as all most robust systems
separate operations), which can improve things in a lot of other sides
too.
Not a separate allocator per-se, separate socket group, they are
serviced by the kernel, they will never refuse to process data, and it
is critical for the continued well-being of your kernel that they get
their data.
The memalloc reserve is indeed a separate reserve, however it is a
reserve that already exists, and you are busy creating another separate
reserve to further partition memory. Partitioning memory is not the
direction we should be going, we should be trying to unify our reserves
wherever possible, and find ways such as Andrew and others propose to
implement "reserve on demand", or in other words, take advantage of
"easily freeable" pages.
Such approach does not fix the problem.
Why no one complain that there is priveledge separation while "we can
fix all existing application"?
It is possible that there will not be any "easily freeable" pages, and
your special reserve will not be filled.
If your allocation code is so much more efficient than slab then why
don't you fix slab instead of replicating functionality that already
exists elsewhere in the system, and has since day one?
No one need an exuse to rewrite something.
You do not know in advance which sockets must be separated (since only
in the simplest situation it is the same as in NBD and is
kernelspace-only),
Yes we do, they are exactly those sockets that lie in the block IO path.
The VM cannot deadlock on any others.
There are some pieces of the world behind NBD and iSCSI.
you can not solve problem with ARP/ICMP/route changes and other control
messages, netfilter, IPsec and compression which still can happen in your
setup,
If you bothered to read the patches you would know that ICMP is indeed
handled. ARP I don't think so, we may need to do that since ARP can
believably be required for remote disk interface failover. Anybody
who designs ssh into remote disk failover is an idiot. Ssh for
configuration, monitoring and administration is fine. Ssh for fencing
or whatever is just plain stupid, unless the nodes running both server
and client are not allowed to mount the remote disk.
I only remember that socket with sk_memalloc is being handled, no ICMP,
no ARP. What about other control messages in bonding setup of failover?
if something goes wrong and receiving will require additional
allocation from network datapath, system is dead,
this strict conditions does not allow flexible control over possible
connections and does not allow to create additional connections.
You know that how?
Also, I do not think people would like it if say 100M of their 1G system
just disappears, never to used again for eg. page-cache in periods of
low network traffic.
Just for clarification: network tree allocator gets 512kb and then
increases cache size when it is required. Default value can be changed
of course.
Great. Now why does the network layer need its own, invented-in-netland
allocator? Why can't everybody use your allocator if it is better?
As far as I recall, I several times already said, that there is no
problem to use that allocator in any other places, MMU-less systems will
especially greatly benefit from it (since it was designed for them too).
Also, please don't get the idea that your allocator by itself solves the
block IO receive starvation problem. At the very least you need to do
something about network traffic that is unrelated to forward progress of
memory writeout, yet can starve the memory writeout. Oh wait, our patch
set already does that.
It is not my allocator that solves the problem, but a situation when
pools are separated!
And you slowly go that direction too (global reserve instead of per-socket
is the first step).
Problem is not solved, when critical allocations depend on reserve which
depends on main system conditions.
Regards,
Daniel
--
Evgeniy Polyakov
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Peter Zijlstra
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Daniel Phillips
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Evgeniy Polyakov
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Daniel Phillips
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Evgeniy Polyakov
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Daniel Phillips
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Evgeniy Polyakov
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Peter Zijlstra
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Evgeniy Polyakov
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- From: Daniel Phillips
- Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- Prev by Date: Re: cdev documentation (was Drop second arg of unregister_chrdev())
- Next by Date: Re: GPL Violation?
- Previous by thread: Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- Next by thread: Re: [RFC][PATCH 0/9] Network receive deadlock prevention for NBD
- Index(es):
Relevant Pages
|