Re: Wondering about raid5
From: P.T. Breuer (ptb_at_oboe.it.uc3m.es)
Date: 04/13/04
- Next message: Jens.Toerring_at_physik.fu-berlin.de: "Re: Questions about file-locking"
- Previous message: Dragan Cvetkovic: "Re: Signal information in /proc/<pid>/status?"
- In reply to: Kasper Dupont: "Re: Wondering about raid5"
- Next in thread: Kasper Dupont: "Re: Wondering about raid5"
- Reply: Kasper Dupont: "Re: Wondering about raid5"
- Reply: Kasper Dupont: "Re: Wondering about raid5"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 13 Apr 2004 21:33:57 +0200
Kasper Dupont <kasperd@daimi.au.dk> wrote:
> "P.T. Breuer" wrote:
> >
> > Kasper Dupont <kasperd@daimi.au.dk> wrote:
> > > I'm wondering how raid5 handles a powerfailure in
> > > the midle of a write.
> >
> > It doesn't do anything - it's dead.
> >
> > > To update a data block both
> > > the data block and the coresponding parity block
> > > needs to be written.
> >
> > Don't worry about it - the parity block will be rewritten next time the
> > raid is started and the whole thing is checked over.
>
> Uhm, is it going to computer every single parity block
> on the entire raid?
Under normal circumstances, yes.
> That would probably take about 45
> minutes on my system.
It does it in background.
> Does that mean the system will
> use so much time to boot? Or is it happening in the
> background?
The latter.
> > If, however, at
> > that time you are a disk short, then the parity will be believed and
> > your data will be recreated from it.
>
> If the lost disk was not one of the two being written
> it would would result in incorrect data being recreated.
Yes - I didn't bother with that detail, you get the idea!
> That actually means corruption of a sector where no
> write was going on.
Well, the data will be recreated wrongly for the missing disk, at the
corresponding sector, yes. That will transform itself into an actual
corruption when a replacement disk is brought in.
> > So if data is written before parity, then you win provided you come
> > up with all disks, and you lose if you come up a disk short (hey, you
> > lose the last write, so what ...).
>
> If I just lost the last write I wouldn't worry. What
> I'm worried about is incorrect data being seen in other
> sectors in the same stripe.
It will happen. Silent corruption, silent data creep - those are all
terms known in the data storage business. To fix it you will need to
have extra redundancy :-).
> > If parity is written before data, then you lose if you come up with
> > all disks again (the parity will be overwritten using old data),
> > and you win if you come up a disk short.
>
> The order of the writes cannot make a difference. I don't
> care what is in the last sector written (or the last few
> sectors even). A journaling filesystem could take care of
Journalling does not do any magic either - it cannot guarantee
that something is written after it leaves the journal cache, it
can only send it out.
> that. I worry about the rest of the stripe.
There are raid systems that reread the written data after writing it,
just to be sure, but generally you will also be liable to data
corruption at the disk level. For journalling too - if you tell the
disk to write 45 and it really writes 44, nobody will know until next
time ...
> > > Assuming a powerfailure
> > > happening, it is possible that only one of the two
> > > writes is completed, but the other isn't.
> >
> > Yep - but I wouldn't worry. Much worse things are possible. Disk
> > manufacturers claim that writes sent are always committed if power is
> > lost, because there is enough power in the capacitors to write the
> > pending requests before spindown. But if you believe that, you also
> > believe many other good things about disks ;-).
>
> Even if the disk would have enough power to complete a
> write, you still wouldn't prevent one disk from completing
> a write, where the corresponding write on another disk had
> not started yet.
Possibly. There's much worse -I don't believe write order is presrved
in any sense through the kernel raid layers, so journalling file
systems would corrupt on their own, even working perfectly (I may be
wrong, but I have examined the code and not seen anything that maintains
ordering - remember that requests to the raid device are marshalled and
copied to slave devices before being acked, but this is at an
individual level, and there is nothing to say that the requests cannot
arrive at the slaves out of order).
> > > When power is restored there will be an inconsistency.
> >
> > Yep. Or not.
>
> If power was lost at the wrong time, there will be an
> inconsistency. But of course if the raid is marked dirty
> and an unclean shutdown results in a recalculation of all
> parity sectors, the inconsistency would be fixed. So what
> does the system do? Doesn't trust any parity sectors until
> recalculation have completed? That means any read/write
> would have to be done the hard way. Or does it keep track
> of how far it have recalculated and just avoid parities
> above that?
If you break a raid system before parity recalculation is complete, it
is very nicely broken indeed. At that point what to do is largely up to
you, the admin.
> > > Does anyone here know how raid5 (and raid1) prevents
> > > this from causing data loss?
> >
> > Yes - everyone does. It doesn't. RAID isn't magic. It only prevents
> > certain sorts of data loss, not all sorts.
>
> Of course I know that. What I'm talking about is only
> those errors that could potentially happen with a raid
> system, that wouldn't have happened without.
Well, you mean raid creating a confusion, because it has two sources of
data? Yes, you can now corrupt the corroboratative data as well as the
data. But that's only 50% more danger over the danger of data corruption
alone (3 disks), and you can now expect to lose a whole disk without data
loss if you don't hit that one unlucky moment. The unlucky moment is
50% more likely, but it was only a 1000:1 chance anyway! And the risk
of losing one of two disks is something like 50% per year!
Peter
- Next message: Jens.Toerring_at_physik.fu-berlin.de: "Re: Questions about file-locking"
- Previous message: Dragan Cvetkovic: "Re: Signal information in /proc/<pid>/status?"
- In reply to: Kasper Dupont: "Re: Wondering about raid5"
- Next in thread: Kasper Dupont: "Re: Wondering about raid5"
- Reply: Kasper Dupont: "Re: Wondering about raid5"
- Reply: Kasper Dupont: "Re: Wondering about raid5"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|