Re: [PATCH 0/12] Per-bdi writeback flusher threads v7



On Tue, May 26 2009, Damien Wyart wrote:
I have been playing with v7 since your sending and after a while
(short on laptop, longer on desktop, a few hours), writeback doesn't
seem to work anymore. Manual call to sync hangs (process in D state)
and Dirty value in meminfo gets growing. As previous versions had
been heavily tested, I guess there is some regression in v7.

Not good, the prime suspect is the sync notification stuff. I'll take
a look and get that fixed. You didn't happen to catch any sysrq-t back
traces or anything like that? Would be interesting to see where
bdi-default and the bdi-* threads are stuck.

No, as I was doing many things at the same time and not exclusively
debugging, I just rebooted hard and went back to an upatched kernel when
the problems occured. But I noticed only bdi-default was alive, the
other bdi-* threads had disappeared and the sync commands I had tried
were all in D state. Also I tried to reinstall a kernel .deb (these
systems are Debian) and this got stuck guring installation, when probing
grub config (do not know if there is some sync syscall inthere).

Can try to go further tomorrow but will not have a lot of time...

OK, I spotted the problem. If we fallback to the on-stack allocation in
bdi_writeback_all(), then we do the wait for the work completion with
the bdi_lock mutex held. This can deadlock with bdi_forker_task(), so if
we require that to be invoked to make progress (happens if a thread
needs to be restarted), then we have a deadlock on that mutex.

I'll cook up a fix for this, but probably not before the morning.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Programs cannot write to CDs after SP2 installed
    ... > ISO's, but the sync to CD, write to CDs doesn't work anymore. ... > Thanks/Howard ...
    (microsoft.public.windowsxp.general)
  • Re: sync 2004 to palm 72
    ... The new conduit is in applications/ms office 2004/additional tools. ... > condut does not work anymore and I need to figure out how to sync to ...
    (microsoft.public.mac.office.entourage)
  • Re: [ANN] Mongrel 0.3.13.4 Pre-Release -- Rubys LEAK Fixed (Death To Mutex!)
    ... just tons of memory usage. ... and mtrace also show no leak? ... Making a string out of that number is a much hungrier operation that, much more quickly, seems to show Mutex "leaking". ... The Mutex version ends up with many more objects sitting in ObjectSpace than the Sync based version has, and while the Sync based version's object counts are pretty consistent, the Mutex version's are volatile, jumping up and down wildly, with a trend towards an increasing average count. ...
    (comp.lang.ruby)
  • Re: [PATCH] Remove process freezer from suspend to RAM pathway
    ... will almost certainly deadlock. ... Now you are entering really dangerous territory. ... If you can implement a meaningfull sync method, ... Then we are in real trouble. ...
    (Linux-Kernel)