Re: RAID question

From: Paul Colquhoun (postmaster_at_andor.dropbear.id.au)
Date: 11/26/04


Date: Fri, 26 Nov 2004 07:04:02 GMT

On Fri, 26 Nov 2004 00:22:22 -0300, Juhan Leemet <juhan@logicognosis.com> wrote:
| On Fri, 26 Nov 2004 01:03:59 +0000, Paul Colquhoun wrote:
|> On Thu, 25 Nov 2004 21:03:18 -0300, Juhan Leemet <juhan@logicognosis.com> wrote:
| [snippage]
|> | I think the term "parity" used for the additional drive for RAID5 is
|> | actually incorrect. It is much more like the ECC syndrome...
|>
|> Sorry to burst your bubble, but it *is* parity. The data that is stored
|> on the parity block of a stripe in a RAID5 setup is generated by XORing
|> all the corresponding data blocks.
|
| OK, I stand corrected. I'll have to read up on RAID5 implementation. Is it
| XOR by definition? or is XOR chosen as implementation in some/all designs?
|
|
|> When a drive failes, its data can be reconstucted bu XORing all the
|> remaining data blocks and the parity block.
|>
|> No fancy ECC algorithyms involved at all.
|
| Even better. BTW, I did not say what algorithm was to be used to
| reconstruct. I was addressing the OP argument that there HAD to be
| inefficiencies in the formula used for calculating capacity of RAID5, i.e.
| his claim about "...the formula in question being bullshit."
|
| I would hesitate to assume that all RAID implementations are the same,
| i.e. you can just swap a set of drives from one into another, assuming XOR.
|
| What you say could explain something, tho. I have been mystified by
| reports from people running RAID who have complained that the RAID system
| has synced the data "the wrong way", and trashed the data. I think this is
| especially true if there is a flakey drive, as opposed to a completely
| non-functioning drive. If the algorithm is simply an XOR, then it has to
| depend on detecting the failure some other way. You cannot detect and
| correct using the same bit (considering all bits in parallel). You have to
| somehow know which drive is "bad". If you are getting data from all drives
| and the parity is wrong, which one do you fix? can you fix anything? I
| also don't think (watching my S/W RAID5) that it always reads all disks.

Correct. The parity is not used to detect errors. Each block on disk also has
an internal checksum that can be used to detect errors, or it can be as simple
as the drive no longer responding to read commands over the interface.

| This is making me uneasy about all that RAID5 stuff. Is it more reliable?
| I suppose better than losing a disk in a metadevice, but maybe less than
| guaranteed integrity with one drive malfunction (failure? flakey?).

I've never seen a single drive failure result in lost data.

-- 
Reverend Paul Colquhoun, ULC.    http://andor.dropbear.id.au/~paulcol
     Asking for technical help in newsgroups?  Read this first:
        http://catb.org/~esr/faqs/smart-questions.html#intro