Re: [fwd] md raid1 chokes when one disk is removed

From: Rick Stevens (rstevens_at_vitalstream.com)
Date: 11/11/05

  • Next message: Rick Stevens: "Re: [Fedora] Re: X-related questions"
    To: For users of Fedora Core releases <fedora-list@redhat.com>
    Date: Fri, 11 Nov 2005 14:25:35 -0800
    
    

    On Fri, 2005-11-11 at 14:16 -0800, Danny Howard wrote:
    > Hello, Fedorians!
    >
    > I asked this question over on redhat-list. (Was hoping to ask RedHat
    > support.) But that list is something of a ghost town. Perhaps someone
    > on fedora-list can comment of md behaviour when a drive gets pulled?
    > I'd appreciate figuring this out.
    >
    > Thanks in advance.
    >
    > Sincerely,
    > -danny
    >
    > ----- Forwarded message from Danny Howard <dannyman@toldme.com> -----
    >
    > Hello,
    >
    > I am evaluating RHEL, prior to purchase for a new production network.
    > Our boxes are SuperMicro 6018HT with dual SATA drives.
    >
    > I like to give my system a bit of added resiliency with RAID1. These
    > systems have pairs of SATA disks, but no hardware RAID. With FreeBSD, I
    > can set up a gmirror and have a RAID1 system. (I have documentation on
    > that at
    > http://dannyman.toldme.com/2005/01/24/freebsd-howto-gmirror-system/ )
    > So, for Red Hat, I checked the manual, and thought I'd give the Red Hat
    > method a shot.
    >
    > Here's a capture of my Disk Druid:
    > http://www.flickr.com/photos/dannyman/61643870/
    >
    > And, here's some info from the running system:
    > [root@linux ~]# cat /etc/fstab
    > # This file is edited by fstab-sync - see 'man fstab-sync' for details
    > /dev/md2 / ext3 defaults 1 1
    > /dev/md0 /boot ext3 defaults 1 2
    > none /dev/pts devpts gid=5,mode=620 0 0
    > none /dev/shm tmpfs defaults 0 0
    > none /proc proc defaults 0 0
    > none /sys sysfs defaults 0 0
    > /dev/md1 swap swap defaults 0 0
    > /dev/hdc /media/cdrom auto pamconsole,fscontext=system_u:object_r:removable_t,exec,noauto,managed 0 0
    > /dev/fd0 /media/floppy auto pamconsole,fscontext=system_u:object_r:removable_t,exec,noauto,managed 0 0
    > [root@linux ~]# mount
    > /dev/md2 on / type ext3 (rw)
    > none on /proc type proc (rw)
    > none on /sys type sysfs (rw)
    > none on /dev/pts type devpts (rw,gid=5,mode=620)
    > usbfs on /proc/bus/usb type usbfs (rw)
    > /dev/md0 on /boot type ext3 (rw)
    > none on /dev/shm type tmpfs (rw)
    > none on /proc/sys/fs/binfmt_misc type binfmt_misc (rw)
    > sunrpc on /var/lib/nfs/rpc_pipefs type rpc_pipefs (rw)
    >
    > [root@linux ~]# cat /etc/mdadm.conf
    >
    > # mdadm.conf written out by anaconda
    > DEVICE partitions
    > MAILADDR root
    > ARRAY /dev/md2 super-minor=2
    > ARRAY /dev/md0 super-minor=0
    > ARRAY /dev/md1 super-minor=1
    > [root@linux ~]# cat /proc/mdstat
    > Personalities : [raid1]
    > md1 : active raid1 sdb2[1] sda2[0]
    > 2032128 blocks [2/2] [UU]
    >
    > md2 : active raid1 sdb3[1] sda3[0]
    > 76011456 blocks [2/2] [UU]
    >
    > md0 : active raid1 sdb1[1] sda1[0]
    > 104320 blocks [2/2] [UU]
    >
    > unused devices: <none>
    >
    > Sweet! I can "fail" a disk and remove it thus:
    > mdadm --fail /dev/md0 /dev/sdb1
    > mdadm --fail /dev/md1 /dev/sdb2
    > mdadm --fail /dev/md2 /dev/sdb3
    > [ ... physically remove disk, system is fine ... ]
    > [ ... put the disk back in, system is fine ... ]
    > mdadm --remove /dev/md0 /dev/sdb1
    > mdadm --add /dev/md0 /dev/sdb1
    > mdadm --remove /dev/md1 /dev/sdb2
    > mdadm --add /dev/md1 /dev/sdb2
    > mdadm --remove /dev/md2 /dev/sdb3
    > mdadm --add /dev/md2 /dev/sdb3
    > [ ... md2 does a rebuild, but /boot and <swap> are fine -- nice! ... ]
    >
    > Okay, but what if a disk fails on its own?
    >
    > [root@linux ~]# cat /proc/mdstat
    > Personalities : [raid1]
    > md1 : active raid1 sdb2[1] sda2[0]
    > 2032128 blocks [2/2] [UU]
    >
    > md2 : active raid1 sdb3[1] sda3[0]
    > 76011456 blocks [2/2] [UU]
    >
    > md0 : active raid1 sdb1[1] sda1[0]
    > 104320 blocks [2/2] [UU]
    >
    > unused devices: <none>
    > [ ... pull sdb ... ]
    > [root@linux ~]# cat /proc/mdstat
    > ata1: command 0x35 timeout, stat 0xd0 host_stat 0x61
    > ata1: status=0xd0 { Busy }
    > SCSI error : <0 0 1 0> return code = 0x8000002
    > Current sdb: sense key Aborted Command
    > Additional sense: Scsi parity error
    > end_request: I/O error, dev sdb, sector 156296202
    > md: write_disk_sb failed for device sdb3
    > ATA: abnormal status 0xD0 on port 0x1F7
    > md: errors occurred during superblock update, repeating
    > ATA: abnormal status 0xD0 on port 0x1F7
    > ATA: abnormal status 0xD0 on port 0x1F7
    > ata1: command 0x35 timeout, stat 0x50 host_stat 0x61
    > [ ... reinsert sdb ... ]
    > Personalities : [raid1]
    > md1 : active raid1 sdb2[1] sda2[0]
    > 2032128 blocks [2/2] [UU]
    >
    > md2 : active raid1 sdb3[1] sda3[0]
    > 76011456 blocks [2/2] [UU]
    >
    > md0 : active raid1 sdb1[1] sda1[0]
    > 104320 blocks [2/2] [UU]
    >
    > unused devices: <none>
    >
    > I don't like that the system seems to choke when the disk is removed
    > unexpectedly. Is this intended operation? Do I need to massage my SCSI
    > subsystem a bit? What's up? :)

    The RAID should go into degraded mode and continue to run. If you
    pulled the disk out while the system was powered up AND the system isn't
    hot swap-compatible (and most built-in SATA stuff isn't), then you've
    confused the SCSI bus badly and I'd be amazed if it worked at all after
    that. Your error messages indicate that's the case here.

    If, however, the SATA drives were in a hot swap-compatible enclosure and
    you see the same problem, then something else is wrong and we'd need to
    look at that a bit more closely.

    At least it all works for us. Your mileage may vary.
    ----------------------------------------------------------------------
    - Rick Stevens, Senior Systems Engineer rstevens@vitalstream.com -
    - VitalStream, Inc. http://www.vitalstream.com -
    - -
    - Hard work has a future payoff. Laziness pays off now. -
    ----------------------------------------------------------------------

    -- 
    fedora-list mailing list
    fedora-list@redhat.com
    To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
    

  • Next message: Rick Stevens: "Re: [Fedora] Re: X-related questions"

    Relevant Pages

    • Re: separate hard drive for scratch disks for two different programs?
      ... measures out at a higher speed than the 750GB SATA ... much faster using the middle cylinders than the SCSI ... This is a function of the way the OS deals with the disk, ... of sharing the load between multiple drives, ...
      (rec.photo.digital.slr-systems)
    • idle RAID1 cpu usage
      ... There's one little modification I made to it: instead of 2 SCSI disks, ... it has one SCSI and one SATA disk (and a PCI SII 3512 card to connect ... # Loadable module support ... # CD-ROM/DVD Filesystems ...
      (Linux-Kernel)
    • Re: moving an installed Debian system onto RAID-1
      ... I just succesfully completed a 3-day crusade against my scsi chain. ... from various fs-drivers when trying to install / fs ... -mount one of the two disks mentioned in the raidtab, ... -now you should have an ide disk that's a full install of everything ...
      (Debian-User)
    • RE: Installation instructions for Firefox somewhere?
      ... They both have the same type of SCSI ... Same thing as above just with the second disk. ... the other initiators and targets on the bus, as soon as that happens all ... to the adapter card your disks are tied to and has decided to just ...
      (freebsd-questions)
    • Re: Not able to boot from SCSI disk on SYM21002 but all other works
      ... It is a two channel SCSI hba with 16 bit SCSI and PCI interface. ... as I tried to boot the new system from the SCSI disk it presents only "PRESS ANY KEY TO REBOOT' and stops. ... So if any of the drives in the SCSI system could be found this would be enabled as C: ...
      (comp.periphs.scsi)