Re: [dm-devel] Re: Data corruption with raid5/dm-crypt/lvm/reiserfs on 2.6.19.2



On Mon, 2007-01-22 at 22:42 +0100, Christophe Saout wrote:
Am Montag, den 22.01.2007, 11:56 -0800 schrieb Andrew Morton:

There has been a long history of similar problems when raid and dm-crypt
are used together. I thought a couple of months ago that we were hot on
the trail of a fix, but I don't think we ever got there. Perhaps
Christophe can comment?

No, I think it's exactly this bug. Three month ago someone came up with
a very reliable test case and I managed to nail down the bug.

Readaheads that were aborted by the raid5 code (or some layer below)
were signalled using a cleared BIO_UPTODATE bit, but no error code, and
were missed as aborted by dm-crypt (all other layers apparently set the
error code in this case, so this only happened with raid5) which could
mess up the buffer cache.

Anyway, it then turned out this bug was already "accidentally" fixed in
2.6.19 by RedHat in order to play nicely with make_request changes (the
stuff to reduce stack usage with stacked block device layers), that's
why you probably missed that it got fixed. The fix for pre-2.6.19
kernels went into some 2.6.16.x and 2.6.18.6.

Hi Chris:

I've been trying Andrew's suggestion of doing fault injections,
currently just kmalloc() and mempool_alloc(), and just got a hang
on 2.6.12 that I'm poking around on with kgdb. I'm using dm-crypt
on a SCSI raid-1 (mirrored) root partition. I'm using your patch
that fixes the raid5 problem just to play it safe.

So far it looks like processes are waiting to be woken up by
the buffer cache once reads have completed.

-piet



--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/dm-devel
--
Piet Delaney Phone: (408) 200-5256
Blue Lane Technologies Fax: (408) 200-5299
10450 Bubb Rd.
Cupertino, Ca. 95014 Email: piet@xxxxxxxxxxxx

Attachment: signature.asc
Description: This is a digitally signed message part



Relevant Pages

  • Re: Itunes users
    ... My current bugbear is that I can't update my iPad2 to iOS5. ... it fails with an error code. ... that the 'fix' is to unplug all USB devices except for the iPad2. ... That's not a fix, it's a troubleshooting step in case you have some ...
    (uk.games.video.misc)
  • RE: sp2 download failed
    ... "Fix for error code 8007F0CC while downloading SP2. ... Remove this variable, reboot, and try the SP2 install ...
    (microsoft.public.windowsupdate)
  • Re: [stable] [PATCH 000 of 2] md: Fixes for md in 2.6.23
    ... it looks like you cherry picked commit 4ae3f847 "md: raid5: ... You should either also pick up def6ae26 "md: fix ... fix clearing of biofill operations ...
    (Linux-Kernel)
  • Re: Lite-On CD-RW disappeared from Windows
    ... See Alex Nichol's fix at: ... no it is not listed in disk manager. ... > mins. I've been asked to reboot. ... Still Error Code 19 - my registry might be ...
    (microsoft.public.win2000.hardware)
  • Re: Laptop questions
    ... the device manager and uninstalled the cd-rom, ... to install it, Windows said it was corrupt or missing. ... I spent all day researching the net and came up with a possible fix ... the error code 39 is intriguing. ...
    (comp.sys.laptops)