Re: RAID question
From: Juhan Leemet (juhan_at_logicognosis.com)
Date: Thu, 25 Nov 2004 21:03:18 -0300
On Wed, 24 Nov 2004 13:06:47 -0600, Ivan Marsh wrote:
> On Wed, 24 Nov 2004 18:44:35 +0000, Nick Landsberg wrote:
>> Mark wrote:
>>> On Wed, 24 Nov 2004 11:16:17 -0600, Ivan Marsh wrote:
>>>>On Tue, 23 Nov 2004 11:25:53 -0700, Steve Wolfe wrote:
>>>>>>>>How much usable space will I have?
>>>>>>>N=3, S=146, so you'll have 292 GB.
>>>>>>>>If I add a forth 146gb drive, how much will I have.
>>>>>>>N=4, S=146, so you'll have 438 GB.
>>>>>>I don't think that formula is very accurate.
>>>>>>With RAID 5 you end up with ~60-70% of the total drive space
>>>>>>(depending on the RAID controller). RAID 5 requires overhead to
>>>>>>With three drives (the minimum) you lose an entire drive worth of
>>>>>>space. Adding another 3 drives will cost you less than another entire
>>>>>>drive worth of overhead.
>>>>> Yes, it's accurate. with RAID 5, you always lose the capacity of one
>>>>>drive, so with 4 drives, he'd end up with the capacity of 3*146=438
>>>>>gigs. The formula is (N-1)*S.
>>>>Okay... help me out with this. Am I just thinking of the striping wrong
>>>>3 x 10g drives = 30g total = 20g raid 5 = 10g of overhead
>>>>100 x 10g drives = 1000g total = 990g raid 5 = 10g of overhead
>>>>1000 x 10g drives = 10000g total = 9990g raid 5 = 10g of overhead
>>>>So, no matter how many drives are in the array there is always the same
>>>>amount of overhead? Why does it take just as much space to stripe 20g of
>>>>data as it does to stripe 1t of data?
>>>>Is there something magical about striping with parity that I'm missing?
I think there are common misconceptions and assumptions.
>>> It's hardly magical... it's simply parity, nothing fancy... :)
Actually, I think it is more like ECC, loosely called "parity information"
as opposed to (actual) "data" (bits and bytes). See below...
>>> You need only one bit to store the parity of an arbitrary number of
>>> other bits... Parity is simply recording it the number of 1's in the X
>>> bits is even or odd.
Yeah, but he is right in saying just one parity bit won't give you enough
to correct anything. In fact, there are more than enough parity (actually
ECC) bits available to correct for single drive failures in RAID5.
>> There is always the consideration of reliability.
>> If you have 5 drives in a raid 5, there is a certain (smal) probability
>> that one of them would fail and one of the other 4 would fail while you
>> were repairing/ replacing the failed drive. When you have 100, the
>> probability is probably 20 times as high that this would happen.
> The more drives in a RAID 5 array the more reliable it is becuase the data
> is spread across more drives.
No, review your probability theory. If the probability of a single drive
failing is X then the probability of 1 drive failing out of M drives is
M*X, assuming they are independent (which is not strictly true, the
probability is actually HIGHER because of common causes like heat, but
close enough). With more drives you have greater chance of a drive
failing. RAID5 design is immune to ONE drive failing. However, you better
get a spare in there synced up real fast, because if another drive fails,
then you lose EVERYTHING! That is why RAID5 arrays typically have an
unused drive as a spare. Big arrays might have 2. Software starts syncing
it up as soon as it detects failure, so that you don't have to monitor the
disks like a hawk, checking every 20 minutes (pick a small number) or so.
The objective it to maintain data integrity, even if might happen to
increase (?) the probability of hardware failures. (feel lucky today?)
> Which still has nothing to do with the formula in question being bullshit.
Lighten up, man, why the aggro? Everyone makes some wrong assumptions here.
I think the term "parity" used for the additional drive for RAID5 is
actually incorrect. It is much more like the ECC syndrome that is being
stored. If you research the number of bits required for ECC, you find that
it gets better and better (i.e. you need less and less) as your
unit/bundle gets bigger. Every bit you add in the syndrome allows you to
correct another bit in 2x as many bits in the rest of the data portion.
In other words (I think this is correct, it goes something like this);
data bits ECC bits required
You read most memory descriptions talking about using 8 ECC bits for 64
data bits, but I think that is just a convenience: the chips typically
come in widths of 8 bits: 8*8=64 data bits, and another chip: 8 bits for
ECC. You can always use more, you can never use less than required. Keep
in mind that memory ECC typically only corrects 1 bit in an entire word.
For RAID5 the data units are blocks of 32KB or so. Therefore an additional
32KB block for the ECC syndrome is WAY more than what you would minimally
need to correct 1 bit, or even 32KBs for virtually any number of drives,
even the 1000 in your example. It even gets better as size increases.
Unfortunately, one does not lose only 1 bit, but one loses an entire drive.
I think the formula (generalized from 1 bit) that must be respected is:
2**(# of data bits) * (total # bits) <= 2**(total # bits)
That gets easier as the total # bits gets higher, because for big values
2**N * (N+M) increases slower than 2**(N+M)
In practice no one uses many drives, mostly for power or cable fan-out
considerations. Also, I think the time required to reconstruct (correct)
increases (radically?) with the number of drives. If the time required to
sync becomes very high, you have a significant chance of a second drive
failure, and the whole thing was a pointless effort.
Incidentally, I remember reading up on the math behind ECC. Really quite
elegant and fascinating. What ECC actually does is find the closest valid
data bit combination in a multidimensional space (where the number of
dimensions is the total number of bits = data bits + ECC syndrome bits). I
get a headache trying to imagine 1000s of dimensions, but the
mathematicians that figured that stuff out gave us a really powerful tool!
IMO the formula works! In fact, you could devise a scheme that uses one
smaller ECC syndrome disk for a bunch of data disks, but it is not worth
hassling with the assymetry (stocking different drive sizes, putting them
in the right places, etc.). Easier to use a full data size drive, which
allows the ECC syndrome aka parity blocks to be staggered and striped
across the drives. Another benefit might even be (dunno if this is true?)
faster ECC syndrome computation, i.e. faster correction. There are many
google hits available that describe ECC and RAID. Enjoy!
-- Juhan Leemet Logicognosis, Inc.