Re: bad blocks on raid5 cause filesystem failure

From: alazarev (alazarev_at_itg.uiuc.edu)
Date: 09/21/05

  • Next message: sinisfun_at_gmail.com: "VA Linux Boxes"
    Date: 21 Sep 2005 12:01:53 -0700
    
    

    Thanks for the informative post. I've got a few questions though.

    1) Do you have a link to the report that you read which describes the
    probablity of double fault. Sounds like an interesting read for me.

    2) Correct me if I'm wrong, but if two blocks on a drive, happen to
    fail at the same time, before rebuild can finish parity on the first,
    then you will have a problem, unless you have double parity? Fine, but
    then what about 3 bad blocks in a row. At some point, the RAID
    controller should, like you say, stop all host IO and report the drive
    failed, and then rebuild the drive from parity. How many bad blocks in
    a row should cause this drive failure, three or more, right? Since we
    saw about 10 bad block failures all with the same time stamp, double
    parity would not have helped us at all. The only thing that would have
    helped us is a RAID controller that would stop IO to the host. Instead,
    our RAID still provided "fake access" for the host and thus the fs
    failure. Sound ligit to you? Any idea what functionality this is
    called, so I know to avoid it when shopping around for new RAID? I
    suppose SCSI provides much better reliability in this respect. Too bad,
    we are already in the SATA hole. Too much data to afford moving it to
    SCSI.

    3) Double parity is also called RAID 6, right? Does RAID 6 provide
    double parity at the block level? Or only at the drive level?

    Thanks,

    Alex


  • Next message: sinisfun_at_gmail.com: "VA Linux Boxes"

    Relevant Pages

    • Re: bad blocks on raid5 cause filesystem failure
      ... > fail at the same time, before rebuild can finish parity on the first, ... At some point, the RAID ... > controller should, like you say, stop all host IO and report the drive ... You are 100% right though about the RAID array not working properly. ...
      (comp.os.linux.hardware)
    • Re: aac0: COMMAND 0xffffffffxxxxxxxx TIMEOUT AFTER xx SECONDS
      ... The 2410SA uses SATA discs so I'm assuming that the cables are okay. ... 6MB/s sounds like you aren't getting any help from the card's write cache; its having to do stripe reads to recalculate parity instead of doing full stripe writes. ... Many cards disable write-back cache if the battery module isn't present -- make sure you have one and its working. ... I'm still slightly uncomfortable with the idea of software RAID, but it hasn't lost anything yet, in spite of a few "unplanned outages". ...
      (freebsd-current)
    • Re: AIX V5.3 & FASTT500 PERFORMANCE TUNING
      ... calculate the parity data every time a write is done, there is a decrease on performance when compared with reads, which doesn’t require the parity calculation. ... On a RAID_10, there is no parity calculation on either read or write, but there’s almost always a small slowdown in the write performance, due to the disk internals. ... commonly used implementation of RAID, Level 4 provides block-level striping with a parity disk. ... the information contained in this communication ...
      (AIX-L)
    • Re: Best Raid Level for Streaming?
      ... RAID 3: Striping and Parity ... In RAID level 3, data is striped across a set of disks. ... is generated and stored on a dedicated disk. ... In RAID level 5, both parity and data are striped across a set of disks. ...
      (microsoft.public.windowsmedia.server)
    • Re: drive failure during rebuild causes page fault
      ... RAID level does. ... from buggy software writing garbage to the disk, transient disk errors, or ... this parity data is not read or verified ... If a drive completely fails, then the parity stripe is always read up, and ...
      (freebsd-stable)