XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)



On Tue, May 30, 2006 at 09:20:31PM -0400, Steven Rostedt wrote:
Added all those listed in the MAINTAINERS file for XFS.

Thanks Steve.

On Tue, 2006-05-30 at 15:03 -0400, Valdis.Kletnieks@xxxxxx wrote:
On Tue, 30 May 2006 12:22:01 +0200, Janos Haar said:
Half the processes on the box seem wedged at that same mutex_lock. I can't
seem to find an xfs_qm_shake in my source tree though.

Its in fs/xfs/quota/xfs_qm.c.

kswapd0 D ffff81011fe03c38 0 297 1 1287 19 (L-TLB)
ffff81011fe03c38 0000000000000004 000000000000000a ffff81011f92ba68
ffff81011f92b850 ffffffff805a23a0 0000149f99fa7d7c 000000000003bcde
000000002f2c46e0 ffff81008bc37180
Call Trace: <ffffffff804e5522>{schedule_timeout+34}
<ffffffff80269f87>{xfs_qm_dqunpin_wait+220} <ffffffff80140e74>{debug_mutex_free_waiter+141}

So, we're waiting here on a synchronisation variable that'll
be released once the dquot metadata buffer write completes.

So it is now waiting to be woken up by something that calls:

xfs_qm_dquot_logitem_unpin which seems to be the function to wake it
up.

Mhmm, that'd be called by the I/O completion handler on the buffer
containing that dquot.

And decyphering all the macro crap it seems that the function that wakes
it up is xfs_trans_chunk_committed, or xfs_trans_uncommit.

Right (the former, at this point in the code).

The above xfs_qm_dqunpin_wait still looks awfully racy, and the
xfs_log_force, which I'm assuming wakes up whoever is suppose to wake up
kswapd0, doesn't have a return code check. So if it failed to do

The logforce isn't race-critical here - its ensuring writeout
of previously logged buffers is started before we go to sleep
waiting for the driver to wake us up when its done.

An earlier I/O error on the journal is the only thing the log
force can return as an error there, which isnt useful at that
point anyway (we're in a kernel thread trying to free mem).

whatever the hell it's doing (that code gives me a headache), it looks

Heh, likewise. I have voodoo dolls of one or two of the early
XFS folks that I like to poke with needles occasionally.. :)

like this guy might sleep forever holding a lock that will prevent
others from freeing kernel memory.

It will sleep until the previously initiated buffer write is done.
AFAICT, we aren't seeing the I/O completion here for some reason...
which points more to a possible device driver or h/ware issue (that
is the usual root cause of this sort of hang, anyway).

cheers.

--
Nathan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: XFS related hang (was Re: How to send a break? - dump from frozen 64bit linux)
    ... Subject: XFS related hang (was Re: How to send a break? ... that'd be called by the I/O completion handler on the buffer ... waiting for the driver to wake us up when its done. ... we aren't seeing the I/O completion here for some reason... ...
    (Linux-Kernel)
  • Re: serving a file to a client --- background fcopy read fileevent event puts socket cha
    ... invokes a callback when there's both data waiting and somewhere to ... probably not -size (although having -size pre-load the input buffer ... you can't even observe the progress of the [fcopy] from outside (ie. ... The division of functionality I mentioned should have negligible impact ...
    (comp.lang.tcl)
  • Completion ports and sockets - scaling my app
    ... First chunk 1 is put on his way by WSASend. ... Such an application must, however, take great care to ensure that it posts multiple overlapped sends, instead of waiting for one overlapped send to complete before posting another. ... If it had another buffer already posted, the transport would be able to use that buffer immediately and ... By implementing some kind of sequence numbering scheme? ...
    (microsoft.public.win32.programmer.networks)
  • Re: NIO - handling read events best practice?
    ... > the buffer is read. ... to do it the way you're doing it: what about having the Selector handler ... pull the data out of the channel into a buffer rather than having your ... System: "Wake up. ...
    (comp.lang.java.programmer)
  • Re: Piping between 2 threads
    ... I have the following pipe buffer shared between threads. ... 1 - R starts waiting on write event. ... Use a ringbuffer of buffer pointers. ...
    (comp.programming)