RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: "Weathers, Norman R." <Norman.R.Weathers@xxxxxxxxxxxxxxxxxx>
- Date: Thu, 19 Jun 2008 10:53:28 -0500
-----Original Message-----
From: J. Bruce Fields [mailto:bfields@xxxxxxxxxxxx]
Sent: Monday, June 16, 2008 12:44 PM
To: Weathers, Norman R.
Cc: Jeff Layton; linux-kernel@xxxxxxxxxxxxxxx;
linux-nfs@xxxxxxxxxxxxxxx; Neil Brown
Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
On Fri, Jun 13, 2008 at 05:53:20PM -0500, Weathers, Norman R. wrote:
Norman R. wrote:
-----Original Message-----
From: J. Bruce Fields [mailto:bfields@xxxxxxxxxxxx]
Sent: Friday, June 13, 2008 5:04 PM
To: Weathers, Norman R.
Cc: Jeff Layton; linux-kernel@xxxxxxxxxxxxxxx;
linux-nfs@xxxxxxxxxxxxxxx; Neil Brown
Subject: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
On Fri, Jun 13, 2008 at 04:53:31PM -0500, Weathers,
alloc_skb_fclone().put anymore
The big one seems to be the __alloc_skb. (This is with 16threads, and
it says that we are using up somewhere between 12 and 14 GBof memory,
about 2 to 3 gig of that is disk cache). If I were to
this memory wasthreads out there, the server would become almostunresponsive (it was
bad enough as it was).
At the same time, I also noticed this:
skbuff_fclone_cache: 1842524 __alloc_skb+0x50/0x170
Don't know for sure if that is meaningful or not....
OK, so, starting at net/core/skbuff.c, this means that
in the thirdallocated by __alloc_skb() calls with something nonzero
("fclone") argument. The only such caller is
Didn't seemthrough andCallers of alloc_skb_fclone() include:
sk_stream_alloc_skb:
do_tcp_sendpages
tcp_sendmsg
tcp_fragment
tso_fragment
Interesting you should mention the tso... We recently went
turned on TSO on all of our systems, trying it out to seeif it helped
with performance... This could be something to do withthat. I can try
disabling the tso on all of the servers and see if thathelps with the
memory. Actually, I think I will, and I will monitor thesituation. I
think it might help some, but I still think there may besomething else
going on in a deep corner...
I'll plead total ignorance about TSO, and it sounds like a long
shot--but sure, it'd be worth trying, thanks.
Tried it, not for sure if I like the results yet or not...
to make a huge difference, but here is something that willreally make
you want to drink, the 2.6.25.4 kernel does not go into thesize-4096
hell.
Remind me what the most recent *bad* kernel was of those you tested?
(2.6.25?)
The kernel that we were really seeing the problem with was 2.6.25.4, but
I think we may have figured out the 4096 problem, and it was probably a
mistake on my part, but it is important for the NFS users to see it so
they don't make the same mistake. I had found some performance tuning
guides, and in trying some of the suggestions, found that the setting
changes did seem to help on some things, but of course I never got to
run a check under full load (800 + clients). A suggestion was to change
the tcp_reordering tunable under /proc/sys/net/ipv4 from the default 3
to 127. We think that this was actually causing the issue. I was able
to trace back through all of the changes, and I changed this setting
back to the default 3, and it immediately fixed the size-4096 hell. It
appears that the reordering just eats into the memory, especially in
high demand situations, and I guess that should make perfect sense if we
are actually buffering up packets for reorder, and we are slamming the
box with thousands of requests per minute.
We still have other performance issues now, but it appears to be more of
a bottleneck, the nodes do not appear to be backing off when the servers
are becoming congested.
Nothing jumped out at me in a quick skim through the commits
from 2.6.25
to 2.6.25.4.
The largest users of slab there are the size-1024 and still thecache up about 5
skbuff_fclone_cache. On a box with 16 threads, it will
GB of disk data, and still use about 6 GB of slab to putthe information
out there (without TSO on), but at least it is not causing the diskresponsive. If
cache to be evicted, and it appears to be a little more
I up it to 32 or more threads, however, it gets verysluggish, but then
again, I am hitting it with a lot of nodes.closely at the
userspace going
tcp_mtu_probe
tcp_send_fin
tcp_connect
buf_acquire:
lots of callers in tipc code (whatever that is).
So unless you're using tipc, or you have something in
I supposehaywire (perhaps netstat would help rule that out?), then
makes sense, Ithere's something wrong with knfsd's tcp code. Which
number ofguess.
Not for sure what tipc is either....
I'd think this sort of allocation would be limited by the
the number ofsockets times the size of the send and receive buffers.
svc_xprt.c:svc_check_conn_limits() claims to be limiting
"too many opensockets to (nrthreads+3)*20. (You aren't hitting the
size should beconnections" printk there, are you?) The total buffer
open connectionsbounded by something like 4 megs.
--b.
Yes, we are getting a continuous stream of the too many
scrolling across our logs.
That's interesting! So we should probably look more
pathologicalsvc_check_conn_limits() behavior. I wonder whether some
nodes, but itbehavior is triggered in the case where you're constantly
over the limit
it's trying to enforce.
(Remind me how many active clients you have?)
We currently are hitting with somewhere around 600 to 800
can go up to over 1000 nodes. We are artificially starving with a
limited number of threads (2 to 3) right now on the older 2.6.22.14
kernel because of that memory issue (which may or may not be tso
related)...
So with that many clients all making requests to the server at once,
we'd start hitting that (serv->sv_nrthreads+3)*20 limit when
the number
of threads was set to less than 30-50. That doesn't seem to be the
point where you're seeing a change in behavior, though.
We were estimating between 40 and 50 threads was the cut off for being
able to service all of the (current) requests at once. I haven't ramped
back up to that level yet. I wasn't comfortable yet with letting it all
hang back out just in case we get into that hellish mode again, it can
be a pain to try and get into those systems once they are overloaded
(even over serial, sometimes it can just timeout the login). We had to
actually bring online a second option to help alleviate some of the back
congestion because the servers couldn't handle the workload.
I really want to move forward to the newer kernel, but wehad an issue
where clients all of the sudden wouldn't connect, yet other clientsset us back to
could, to the exact same server NFS export. I had booted the server
into the 2.6.25.4 kernel at the time, and the other admin
the 2.6.22.14 to see if that was it. The clients startedworking again,
and he left it there (he also took out my options in theexports file,
no_subtree_check and insecure). I know that we are running over theinsecure, but I am
number of privelaged ports, and we probably need the
having a hard time wrapping my self around all of the problems at
once....
The secure ports limitation should be a problem for a client
that does a
lot of nfs mounts, not for a server with a lot of clients.
Ah, OK. That makes sense.
--b.--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: J. Bruce Fields
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- References:
- CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: J. Bruce Fields
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: Jeff Layton
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: J. Bruce Fields
- RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: Weathers, Norman R.
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: J. Bruce Fields
- RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: Weathers, Norman R.
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: J. Bruce Fields
- RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: Weathers, Norman R.
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: J. Bruce Fields
- RE: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: Weathers, Norman R.
- Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- From: J. Bruce Fields
- CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- Prev by Date: Re: [BUG][PATCH -mm] avoid BUG() in __stop_machine_run()
- Next by Date: [PATCH 1/2] AMD64: Removing PCI ECS workaround
- Previous by thread: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- Next by thread: Re: CONFIG_DEBUG_SLAB_LEAK omits size-4096 and larger?
- Index(es):
Relevant Pages
|