Re: i thought RAID 5 had a stake through its heart
From: Kasper Dupont (kasperd_at_daimi.au.dk)
Date: 10/04/04
- Next message: Kasper Dupont: "PDC202XX crash"
- Previous message: Kasper Dupont: "Re: Forking Java GUI Apps"
- Next in thread: Ronald Cole: "Re: i thought RAID 5 had a stake through its heart"
- Reply: Ronald Cole: "Re: i thought RAID 5 had a stake through its heart"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 04 Oct 2004 11:29:32 +0200
Ronald Cole wrote:
>
> Kasper Dupont <kasperd@daimi.au.dk> writes:
> > Ronald Cole wrote:
> >>
> >> My understanding is that corruption can creep into a RAID5 system and
> >> you would be unaware of it.
> >
> > Could happen if you have lost a disk and the raid
> > was not shut down cleanly. But other than that,
> > how could it be corrupted?
>
> Partial media failure. Raid5 reads never verifies the checksum.
> Here's how it can (and does) happen:
What the author forgot to mention is, that this
problem is in no way related to RAID5. It would
happen with all other kinds of RAID as well.
And even worse, silent data corruption can
happen even when there is not a single problem
with any of your harddrives. The CPU or the RAM
can introduce corruptions. I have seen it.
(OTOH I have never seen a drive return wrong
data.)
The performance issues mentioned are real, the
question is, how much will you pay for a bit of
performance.
There will always be a compromise between the
four factirs: price, capacity, performance,
security.
compared to RAID10, RAID5 will give you better
price or capacity at a small price in performance
and security. The only reason RAID10 gives you
better security than RAID5 is because there is
a good probability of surviving a double disk
failure.
If you don't think RAID5 is secure enough, consider
RAID6 with two parity disks and hot-spare. It will
give you better security than RAID10 and will also
give you better price or capacity. Of course this
have a price in performance.
And BTW, I recently read a very plausible
explanation for the failure during recovery,
which some people seems to have experienced.
If one of your disks have a few bad sectors,
they might not be noticed for a long time. And
eventually another disk fails, and during
recovery the bad sectors are noticed.
The drive did not fail during recovery. It was
bad already, it was just not noticed.
I can suggest two ways to avoid this problem.
One is to periodically check all your sectors,
such that you will notice bad sectors in time.
Another way to avoid it is RAID6, in which you
can still recover from a completely failed
disk, and a few bad sectors on another.
An improved recovery algorithm would reduce the
probability of this problem. It seems that the
current implementation supports only two states
of a drive, working or faulty. But it would be a
lot more secure to introduce something between
the two. A failing drive, where bad sectors have
been seen, but from which you can still read if
needed.
This condition should of course cause a recovery
to start. As soon as the recovery have completed
the failing drive can be marked faulty and
replaced with the new one. Even with multiple
failing drives you can recover as long as they
don't have bad sectors in the same place.
And BTW that problem is not specific to RAID5
either. It could happen with RAID10 as well,
unless you have at least three mirrors.
-- Kasper Dupont
- Next message: Kasper Dupont: "PDC202XX crash"
- Previous message: Kasper Dupont: "Re: Forking Java GUI Apps"
- Next in thread: Ronald Cole: "Re: i thought RAID 5 had a stake through its heart"
- Reply: Ronald Cole: "Re: i thought RAID 5 had a stake through its heart"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|