Re: 2.6.19-rc6 : Spontaneous reboots, stack overflows - seems to implicate xfs, scsi, networking, SMP



On 21/11/06, David Chatterton <chatz@xxxxxxxxxxxxxxxxx> wrote:
Jesper,

In the short term, the best workaround is to use 8K stacks.

Yeah, that's what I'm currently doing and the box seems more stable
(at least it has not crashed yet, but with 4K stacks it usually would
have by now).

We do not see stack
overflow problems with NFS + XFS + volume managers + disk devices.

Could the size of my devices be part of the cause? some of the logical
volumes I have mounted are multiple TB in size?


Audits have been done in the past and will again be done in the future to try to
identify areas where XFS could use less stack space by reducing/avoid large
local variables. Reducing the code path is far more difficult.

I realize that fixing the problem may be difficult. I just wanted to
make sure that people were informed that there is an actual problem
and provide as much info as possible so that perhaps in the future it
can be fixed... :)
I'm reading through the XFS code myself at the moment and I'll be sure
to submit patches if I spot something that could help reduce stack
usage.


There is active discussion about reducing inlining:
http://bugzilla.kernel.org/show_bug.cgi?id=7364


Thanks, I'll check that out.


I can't speak for the scsi stack usage.

Thanks for traces, I've captured this information.

You are welcome. If you want/need more traces then I've got ~2.1G
worth of traces that you can have :)


Thank you for your reply.


--
Jesper Juhl <jesper.juhl@xxxxxxxxx>
Don't top-post http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please http://www.expita.com/nomime.html
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [2.6 patch] let 4KSTACKS depend on EXPERIMENTAL and XFS on 4KSTACKS=n
    ... >> Mark this combination as BROKEN until XFS is fixed. ... > that people hit with 4K stacks so we can try to address ... and this causes the linux kernel to hang. ...
    (Linux-Kernel)
  • Re: [PATCH][RFC] 4K stacks default, not a debug thing any more...?
    ... because it is known to cause problems with 4K stacks. ... I believe it could be concluded that it's not something in any sense fundamentally unfixable and the question becomes why XFS isn't fixed... ... I've heard about also in this thread are MD/LVM/XFS. ... Good to have pointed out, thanks, but as far as I'm concerned both these cases do not get a say in what's default configuration for the kernel.org kernel. ...
    (Linux-Kernel)
  • Re: [2.6 patch] let 4KSTACKS depend on EXPERIMENTAL and XFS on 4KSTACKS=n
    ... > probably can with 8K stacks but it will be much harder, ... The correct solution is to fix XFS (and other problems with 4kb stacks ... There had been need of rain for many days. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: [2.6 patch] i386: always use 4k stacks
    ... >>> Would you run a desktop with an nfs server on xfs on lvm on dm on SCSI? ... >> It isn't that 4k stacks were completely untested. ... XFS got fixed and Neil's patch should fix the rest of the problem. ... -mm is a pretty experimental kernel and everything using it knows about ...
    (Linux-Kernel)
  • [2.6 patch] let 4KSTACKS depend on EXPERIMENTAL and XFS on 4KSTACKS=n
    ... >>by flushing cached data which is using reserved extents. ... >>reservation for the worst case metadata usage. ... > If this is a known truth with XFS maybe it would be a good idea to have ... 4Kb stacks on i386 are the future. ...
    (Linux-Kernel)