Re: 2.6.19 file content corruption on ext3



On Thu, 28 Dec 2006 17:38:38 -0800 (PST)
Linus Torvalds <torvalds@xxxxxxxx> wrote:

in
the hope that somebody else is working on this corruption issue and is
interested..

What corruption issue? ;)


I'm finding that the corruption happens trivially with your test app, but
apparently doesn't happen at all with ext2 or ext3, data=writeback. Maybe
it will happen with increased rarity, but the difference is quite stark.

Removing the

err = walk_page_buffers(handle, page_bufs, 0, PAGE_CACHE_SIZE,
NULL, journal_dirty_data_fn);

from ext3_ordered_writepage() fixes things up.

The things which journal_submit_data_buffers() does after dropping all the
locks are ... disturbing - I don't think we have sufficient tests in there
to ensure that the buffer is still where we think it is after we retake
locks (they're slippery little buggers). But that wouldn't explain it
anyway.

It's inefficient that journal_dirty_data() will put these locked, clean
buffers onto BJ_SyncData instead of BJ_Locked, but
journal_submit_data_buffers() seems to dtrt with them.

So no theory yet. Maybe ext3 is just altering timing. But the difference
is really large..



Disabling all the WB_SYNC_NONE stuff and making everything go synchronous
everywhere has no effect. Disabling bdi_write_congested() has no effect.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: PB 4.2 - why does the COREDLL!FillInOneHeap(THSNAP ...) cause Debug Break ?
    ... I see that DEBUGCHK() detects something bad. ... Can't find any corruption here or buffer overflow. ... >> casues debugBreak. ...
    (microsoft.public.windowsce.platbuilder)
  • Re: xfs problems (possibly after upgrading from linux kernel 2.6.27.10 to .14)
    ... this implies some kind of memory corruption is occurring. ... And that is another buffer that has been scribbled over. ... Something is corrupting the page cache, ... As far as I can see this only happened after the upgrade about 14 days ...
    (Linux-Kernel)
  • Re: Corrupted files fpd2.5; vfp9 sp1
    ... With VFP, you can buffer the data, which greatly reduces the probability of corruption. ... You need to make sure you've rearchitected your application to use buffering when moving to VFP. ... The fpd version always suffered from sporadic file corruption. ...
    (microsoft.public.fox.programmer.exchange)
  • Re: It still here... panic: ufs_dirbad: bad dir
    ... block pointers, is being written out and then something happens to ... I would write some code to record every I/O operation done on the ... raw device then track back to the write that created the corruption. ... Is it worth setting up a ring buffer that just stores the last few ...
    (freebsd-current)
  • Re: Filter by Form causing corruption ???
    ... good progress. ... time that the workstations are "talking directly" to the ... duration, and for the write locks, that will reduce the ... suscepitibility to corruption. ...
    (comp.databases.ms-access)