Re: Is reiserfs slower to mount / on bootup?
From: Enrique Perez-Terron (enrio_at_online.no)
Date: 11/02/05
- Next message: Leo: "printer Xerox Phaser 8500DN"
- Previous message: bi-weekly: "A new reader? Welcome to comp.os.linux.setup, read this first if you're new here (FAQ)"
- In reply to: Damian Menscher: "Re: Is reiserfs slower to mount / on bootup?"
- Next in thread: Nico Kadel-Garcia: "Re: Is reiserfs slower to mount / on bootup?"
- Reply: Nico Kadel-Garcia: "Re: Is reiserfs slower to mount / on bootup?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 02 Nov 2005 02:00:00 +0100
On Tue, 01 Nov 2005 21:02:48 +0100, Damian Menscher <menscher+linux@uiuc.edu> wrote:
> Enrique Perez-Terron <enrio@online.no> wrote:
>>
>> I also feel a bit confused about the layering here, does not raid5 present
>> a correct contents of the volume in presence of a single failing disk?
>>
>> Is raid5 prone to mistakes if the underlying hw/sw fails to discover there
>> are errors in the data coming from a disk? I mean, does it happen with
>> any appreciable frequency that the disk delivers corrupted data but fails
>> to notice and notify upper layers? If so, it seems to me that the level
>> of security afforded by raid technology has been overstated, or the residual
>> vulnerabilities understated.
>
> You're right... there's a dirty little secret the raid vendors
> aren't telling you about:
>
> Raid protects you against disk failures, yes. The problem is, disks
> don't always fail in the expected way.
>
> Consider a raid5 array. Let's say everything is normal. You'll get
> the fastest read speeds by reading straight from the disks, and
> ignoring parity. Since performance is important, that's what we do.
>
> Great, but what happens if you take an axe and smash a disk? Well,
> that disk fails, and the array becomes degraded. You now have to
> read all the disks and calculate parity to pull a block off. So far,
> everything is working as planned.
>
> But what if, instead of smashing the disk, you just put some noise on
> the line? Maybe a disk is just a little bit sick, and it's writing
> corrupted data to the platters. Well, the disk is spinning, and it's
> responding to the controller, so the controller thinks everything is
> fine. And because the controller thinks it's fine, and the array is
> redundant, it speeds things up by reading off the single disks and
> ignoring parity. Except the stuff it reads off that one sick disk is
> all wrong. So the data passed to the OS is wrong. Your filesystem
> gets corrupted, and all that redundancy didn't protect you.
>
> Supposedly you can run a regular "verify" on the array to detect
> these problems before they take out your filesystem. In my
> experience, that doesn't give enough warning to save you.
>
> Disclaimer: the above is an educated guess about what's happening
> based on my own experiences. I don't make raid firmware for a living.
> Hopefully someone here can point out where I've said something wrong
> in the above, or suggest a way of being safer.
Well, it was something along these lines I was thinking. Now there
is a certain amount of redundancy also in the single-disk non-raid case,
and the disk controller should discover if the data read back from the
disk is corrupted. But there are more failure modes! What if there
is an error that injects noise into the data after the point where
the redundant bits of each sector have been checked and found OK?
This would have to be errors in components that are not tied to
particular sectors or heads or cylinders, so it would be something
that corrupted your data at random points, perhaps long between.
Under such circumstances, a file system like reiserfs or reiser4,
which has a tree structure, and where the metadata do not have any
fixed locations on the disk, must be much more vulnerable. In a file
system where the location of metadata is fixed, it is essentially
as if part of the metadata or pointers had been moved off the fs and
into the fs code.
But Nico tells about a situation where the system does report disk
problems but the file system does not handle it. Or not properly,
anyway. This is something I cannot remember have seen discussed
plainly and seriously, what are the possible, and what are the
reasonable strategies to implement in the kernel. But I won't be
surprised if there has been more private discussions and coordination
between ext2/3 programmers and MD/block device hackers than between
the reiser guys and the MD device hackers.
Does anybody know anything specific in this area?
In networking, we have TCP that handles error discovery and recovery.
Applications don't replicate the efforts of TCP very much, do they?
Is there a way for the block devices to say to the fs code "here you
have the data you requested, I think it is correct, but I had problems,
so you should switch to a mode with increased redundancy and
journalling and whatever you have" -?
I guess the fs asks for a block, and the block device says OK or
ESOMETHING. In the latter case there are no valid data for that block.
For a tree-based fs, if this happens to a node near the root,
a large part of the fs has been lost. Should the fs try to recover
immediately? Or should it panic? Nico's obeservation seems to
suggest the reiserfs just ignored the error conditions and continued.
Ouch.
To recover, the fs must scan all disconnected blocks to discover which
seem to contain internal nodes of the tree. This is error-prone, but
can be hardened with proper redundancy like height of the node, parent
pointer, generation number, and magic number in each internal node.
But if the underlying block device keeps returning ETHIS and ETHAT
during this recovery, all odds are off.
It must be a difficult decision to make for the fs coders, when to
give up, and when to try hard to salvage whatever is possible.
Assuming that people either have or don't have backup, there seems
to be little to loose in not trying to recover. The reiser guys
perhaps did not realize that trying to salvege the data could destroy
the "backup", because it is not in a tape but in the active raid
system.
But on the other hand, probably the decision to try fsck in the
presence of disk errors in the raid was a questinable one, not
just in hindsight, but in principle. (So are often decision
made under stress.)
I also have the impression that the Reiser guys failed to persuade
the distributions to issue updated packages with the latest reiser
tools. It was a constant repetition in the mailing list, people
having failed disks, wanting to recover whatever possible, but not
having the latest toolset, and damaging the data further with old
buggy tools. How could people know that in case of fire, don't use
the fire hose, go out and get a new one first?
-Enrique
- Next message: Leo: "printer Xerox Phaser 8500DN"
- Previous message: bi-weekly: "A new reader? Welcome to comp.os.linux.setup, read this first if you're new here (FAQ)"
- In reply to: Damian Menscher: "Re: Is reiserfs slower to mount / on bootup?"
- Next in thread: Nico Kadel-Garcia: "Re: Is reiserfs slower to mount / on bootup?"
- Reply: Nico Kadel-Garcia: "Re: Is reiserfs slower to mount / on bootup?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|