Re: bad blocks on raid5 cause filesystem failure

From: kermit (cku192_at_yahoo.com)
Date: 09/21/05

  • Next message: Alex Bazan: "System freezing"
    Date: Wed, 21 Sep 2005 22:08:15 +0400
    
    

    alazarev wrote:

    > We use a popular consumer RAID enclosure device. It's 16 SATA drives,
    > with a built in RAID controller, hot swap everything, attaches to the
    > host via SCSI. We've been pretty happy with it up until a few weeks
    > ago. It is setup in in RAID 5. Nothing unusal about the setup. Host is
    > RHEL4-AS 64bit, filesystem is ext3.
    >
    > About a month ago, we saw some bad blocks on a drive, 5 of them in a
    > row. We ignored, we've seen it before and it's never been a problem. A
    > few weeks later, we got 4-5 more bad blocks. Did nothing. A few weeks
    > later, disaster, we got about 10 bad blocks in a row, and the last one
    > took out the filesystem. The host unmounted the filesystem
    > automatically.
    >

    Well, you have been warned and did nothing. Nobody to blame ...

    > The host logs showed that the disc containing the filesystem had a
    > filaure:
    >
    > Sep 7 01:29:52 zeus kernel: attempt to access beyond end of device
    > Sep 7 01:29:52 zeus kernel: sdb1: rw=1, want=8072683984,
    > limit=2927171457
    >

    Yes I have seen similar after hardware error; ext3 sometimes reacts very
    funny (I won't claim it is limited to ext3 only :) It was RHEL3 BTW.
    [...]
    > What angered us most was, there was our RAID system, sitting there
    > running, not even recongnizing that it caused a massive host filesystem
    > failure. The RAID did see bad blocks, but it never marked a drive as
    > bad, never sensed any failure whatsoever.
    >
    > Now, my question is, isn't this problem exactly what RAID is supposed
    > to protect against? How could a RAID controller botch this up?
    >
    [...]
    > Before I start talking to other vendors, is this bad block issue a
    > problem for all RAID controllers? Shouldn't any decent RAID controller
    > be able to rebuild from parity whenever it senses a bad drive, or a bad
    > block? As soon as the RAID sensed a bad block, it should have known
    > there was data on it, and then either failed the drive, moved to
    > degraded, and then start reading from parity, so that the host would
    > never know about a drive failure? How could an enterprise level RAID
    > controller fail to do this?
    >

    You did not name manufacturer; not all of them are really enterprise
    level :)

    But about general theory.

    When RAID5 controller senses a bad block, it first should try to "remake"
    it. Sometimes writing over bad block fixes it; sometimes recomputing
    checksum and rewriting it fixes it; real enterprise level controller should
    log it as correctable error, ideally with extra qualifier to inform, that
    block was recovered without relocation.

    If this was not possible, controller should use spare blocks to relocate bad
    one. Usually in this case contents (or parity) is lost and has to be
    rebuilt. The same (but on more massive scale) happens when the whole drive
    goes bad. In this case all blocks on drives are lost and their contents
    should be rebuilt on spare drive if available from remaining drives.

    All is fine unless you have double fault. If during rebuild controller finds
    *another* failed block on the same stripe, data is irrecoverably lost.
    Apparently here is what happened, but it is hard to tell without more
    information.

    Because during normal operation not all blocks in a stripe are necessarily
    touched it is quite possible that another drive has a bad block that you
    have no idea about _until_ controller must do rebuild. Enterprise level
    storage systems are using background verification that continuously runs
    and checks all blocks on all disks. It lessens double fault probability by
    discovering bad blocks early; still because it runs slowly (to lessen
    impact on performance) it takes much time so blocks may go bad after they
    had been verified as good :)

    Another technique used by enterprise level systems recently is double parity
    (I am surprised that one quite known vendor still does not use it;
    apparently it believes that if you need that level of protection you use
    RAID10 and not RAID5 :). The idea is that you use two parity blocks per
    stripe instead of one - that gives you extra protection against double
    fault during rebuild. Most vendors support it as far as I know.

    Recently I have read report about how probable double fault is. The values
    were quite amazing. For a SATA drives array of 5 disks has about 30%
    probability to encounter double fault over time. And you have 16 drives
    you said ...
     
    > Any help or advice is appreicated, please feel free to email me too.
    >

    To summarize:

    - RAID5 is setup to protect against such types of fault
    - RAID5 is able to protect against single fault only; it can't protect
    against double fault in a stripe
    - double faults are (unfortunately) more and more likely now-a-days,
    especially with SATA disks

    So when selecting vendor choose

    - one with good logging. All errors (not fatal ones) must be logged; large
    number of correctable errors still indicate failing drive that should be
    considered for replacement

    - one that provides for background verification. It helps to prevent such
    problems.

    - one that supports double parity. That gives more protection against such
    faults (yes, it is more expensive because you spend two instead of 1 drive
    per RAID array)

    - finally consider using RAID10 instead of RAID5.

    - in any case storage controller must fail host IO *immediately*; what
    storage controller must *never* do is to fake success when it already has
    bad blocks. I suspect it may also be the reason for ext3 confusion.

    =arvi=


  • Next message: Alex Bazan: "System freezing"

    Relevant Pages

    • Re: Dell PERC 2/SC Single Channel LVD SCSI RAID Controller
      ... 2/SC) PCI RAID controller is a high performance intelligent PCI-to-SCSI ... PERC 2/SC is an ideal RAID solution for the internal ... Multiple logical drives/arrays per controller Up to 8 logical drives ...
      (uk.adverts.computer)
    • Re: SATA vs. SCSI, RAID?
      ... Also this would depend on which controller is ... Well MS's intrinsic SW RAID 1 drivers using typical Intel mobo EIDE ... controllers for ATA or Adaptec SCSI non-RAID cards do very well here. ... But do note that I only mentioned "3x drives ...
      (microsoft.public.windows.server.sbs)
    • Re: Do I have the correct SATA controller installed ?
      ... My intention is to use the 2 SATA hard drives as stand alone drives, ... I was just concerned I might have choosen the wrong ICH7 controller ... > Youre going about it wrong.First,if not running RAID then only 1 hd must ... >> I selected the 82801GB AHCI controller and Windows continued to install ...
      (microsoft.public.windowsxp.hardware)
    • Re: Upgrading to Hardware RAID SBS2000
      ... depends on the RAID controller but as a general rule I'd suggest putting the ... Prug'n'pray should detect the RAID and allow you to ... other drives, be migrated non-destructively between RAID levels. ... > for installing a HW RAID card into an existing server? ...
      (microsoft.public.backoffice.smallbiz2000)
    • Re: Expanding raid volume or any better options folks can recommen
      ... RAID 5 should be faster than non-raid ... and faster than RAID 1+0 (same speed as JBOD with fault tolerance). ... Personally I run two RAID 0 arrays, one with two 250 G drives for OS, ... One one drive is used for parity and the whole array is ...
      (microsoft.public.windows.mediacenter)