Re: Questions about RAID 6



On 4/26/2010 9:29 AM, Tim Clewlow wrote:
Hi there,

I'm getting ready to build a RAID 6 with 4 x 2TB drives to start,
but the intention is to add more drives as storage requirements
increase.

My research/googling suggests ext3 supports 16TB volumes if block
size is 4096 bytes, but some sites suggest the 32 bit arch means it
is restricted to 4TB no matter what block size I use. So, does ext3
(and relevent utilities, particularly resize2fs and e2fsck) on 32
bit i386 arch support 16TB volumes?

I intend to use mdadm to build / run the array. If an unrecoverable
read error (bad block that on disk circuitry cant resolve) is
discovered on a disk then how does mdadm handle this? It appears the
possibilities are:
1) the disk gets marked as failed in the array - ext3 does not get
notified of a bad block
2) mdadm uses free space to construct a new stripe (from remaining
raid data) to replace the bad one - ext3 does not get notified of a
bad block
3) mdadm passes the requested data (again reconstructed from
remaining good blocks) up to ext3 and then tells ext3 that all those
blocks (from the single stripe) are now bad, and you deal with it
(ext3 can mark and reallocate storage location if it is told of bad
blocks too).

I would really like to hear it is either 2 or 3 as I would prefer
not to have an entire disk immediately marked bad due to one
unrecoverable read error - I would prefer to be notified instead so
I can still have RAID 6 protecting "most" of the data until the disk
gets replaced.

Regards, Tim.



I'm afraid that opinions of RAID vary widely on this list (no surprise) but you may be interested to note that we agree (a consensus) that software-RAID 6 is an unfortunate choice.

I believe that the answer to your question is none of the above. The closest is (2.). As I'm sure you know, RAID 6 uses block-level striping. So, what happens is a matter of policy, but I believe that data that is believed lost is recovered from parity, and rewritten to the array.[0] The error is logged, and the status of the drive is changed. If the drive doesn't fail outright, depending on policy[1], the drive may be re-verified or dropped out. However, mdadm handles the error, because it is a lower level failure than ext3.

The problem is when the drive is completely 100% in use (no spare capacity). In that case, no new stripe is created, because there is no room to put one. The data is moved to unused area[1], and the status of the drive is changed. (your scenario 1.) ext3 is still unaware.

The file system is a logical layer on top of RAID, and will only become aware of changes to the disk structure when it is unavoidable. RAID guarantees a certain capacity. If you create a volume with 1 TB capacity, the volume will always have that capacity.

If you set this up, be sure to also combine it with LVM2. Then you have much greater flexibility about what to do when recovering from failures.


[0] This depends on the implementation, and I don't know what mdadm does. Some implementations might do this automatically, but I think most would require a rebuild.

[1] Again, I forget what mdadm does in this case. Anybody?



I'm sorry, I seem to have avoided answering a crucial part of your question. I think that the md device documentation is what you want.


MAA






--
To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx with a subject of "unsubscribe". Trouble? Contact listmaster@xxxxxxxxxxxxxxxx
Archive: http://lists.debian.org/4BD5B2A0.7060505@xxxxxxxxxx



Relevant Pages

  • HELP Recover software RAID5...
    ... I am hoping some brilliant sysadm can help me recover my software raid 5. ... Over Time as the media drive was filled up, I add additional drives to the ... mdadm: /dev/sdh1 requires wrong number of drives. ... mdadm: /dev/sdh has wrong uuid. ...
    (comp.os.linux.misc)
  • Re: Mdadm -- Restoring an array
    ... past, when I've tried to restore a RAID, I've had trouble with it. ... The man page for mdadm makes it look like a RAID can be reassembled ... to tell mdadm to scan local drives and re-assemble an existing RAID. ... Of course, if a drive is replaced, you'll need to create a new conf file. ...
    (Debian-User)
  • Convert a single drive system to RAID
    ... I'm trying to convert a running installation of a Ubuntu server 7.10 AMD64 running on a single drive to a RAID 1 system, but I'm stuck at the boot with a BusyBox shell. ... The machine has two identical SATA drives, the system is at the moment ... Command mdadm is present ...
    (comp.os.linux)
  • Re: Mdadm -- Restoring an array
    ... It turns out that I must have formatted three of the drives, ... past, when I've tried to restore a RAID, I've had trouble with it. ... The man page for mdadm makes it look like a RAID can be reassembled ...
    (Debian-User)
  • Re: [opensuse] Cannot create raid 5 opensuse 10.3
    ... The current server is running opensuse 10.2 and has 4 SATA I drives. ... Formatting software RAID /dev/md0 with ext3 ... Googling on "short read" raid 5 errors and so forth didn't yield anything ...
    (SuSE)