Re: Problems with software RAID on SATA

From: Stephen Tait (tait_at_digitallaw.co.uk)
Date: 08/19/05

  • Next message: Frank Guthorel: "Re: printing to HP"
    Date: Fri, 19 Aug 2005 15:34:39 +0100
    To: debian-user@lists.debian.org
    
    

    At 16:37 18/08/2005, you wrote:
    >Quoting Stephen Tait <tait@digitallaw.co.uk>:
    >
    >>I'm just in the process of setting up a Sarge server to be used as a sort
    >>of backup server. The main PATA discs are used to boot the OS offof
    >>software RAID1, with the rest of the disc space used in JBOD for
    >>not-so-important backups. However, I'm having problems getting the new
    >>disc array up and running.
    >>
    >>We've put a SATA controller in the box, a cheap-as-chips PCI Adaptec
    >>1210SA which, according to lspci, uses the SIlicon Image SI3112 chipset
    >>to provide two SATA channels. Connected to this are two 320GB drives
    >>which I want to turn into a RAID1 array. When the system booted first, I
    >>used mdadm to create the RAID1 array md2 (mdadm --create /dev/md2
    >>--level=1 --raid-disks=2 /dev/sda1 /dev/sdb1), checked /proc/mdstat to
    >>wait for the array to finish syncing, and then formatted it ext3 and
    >>mounted it. Everything seemed to work fine until I rebooted, whereupon
    >>the mount failed with the report that it wasn't a valid ext[2|3]
    >>superblock; fsck confirmed this and on further inspection it seemed that
    >>it wasn't a RAID device any more either.
    >>
    >>...and booted with that instead after editing GRUB's menu.lst. The exact
    >>same error occurred, and I'm now at a bit of a loss to explain what's
    >>happening. If I try and mount the discs on their own (i.e. mount /dev/sdX
    >>/mnt/somedir) then they work just fine, so the hardware works fine - so
    >>I'm almost certain it's a problem with initting the RAID arrays at boot.
    >>At the moment I'm just rebuilding the array to see what happens when I
    >>don't try and mount it at boot, but only after the OS has finished
    >>booting, but of course that'll only be a temporary workaround. If it's
    >>any help, here are my fstab and mdadm.conf's:
    >>
    >>pika@zaphod2:~$ cat /etc/fstab
    >># /etc/fstab: static file system information.
    >>#
    >># <file system> <mount point> <type> <options> <dump> <pass>
    >>proc /proc proc defaults 0 0
    >>/dev/md1 / ext3 defaults,errors=remount-ro 0 1
    >>/dev/md0 /boot ext2 defaults 0 2
    >>/dev/hdb9 /home ext3 defaults 0 2
    >>/dev/hdb4 /mnt/avj-backup ext3 defaults 0 2
    >>/dev/hda7 /mnt/dcj-backup ext3 defaults 0 2
    >>/dev/hdb8 /tmp ext3 defaults 0 2
    >>/dev/md4 /usr ext3 defaults 0 2
    >>/dev/md3 /var ext3 defaults 0 2
    >>/dev/hdb7 none swap sw 0 0
    >>/dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0
    >>#/dev/md2 /mnt/dcj-archive ext3 defaults 0 2
    >>
    >>>===============================================
    >>
    >>pika@zaphod2:~$ cat /etc/mdadm/mdadm.conf
    >>DEVICE partitions
    >>ARRAY /dev/md4 level=raid1 num-devices=2
    >>UUID=b8093124:a6d6f876:a29eecb7:e1b332f3
    >> devices=/dev/hda6,/dev/hdb6
    >>ARRAY /dev/md3 level=raid1 num-devices=2
    >>UUID=1973b0c3:e38869d2:ffef0cde:92048042
    >> devices=/dev/hda5,/dev/hdb5
    >>ARRAY /dev/md2 level=raid1 num-devices=2
    >>UUID=78a3be5a:f0838fe2:4d4ce7ed:3a969954
    >> devices=/dev/sda1,/dev/sdb1
    >>ARRAY /dev/md1 level=raid1 num-devices=2
    >>UUID=51d55d28:3e653dce:631dd682:8dd52a37
    >> devices=/dev/hda2,/dev/hdb2
    >>ARRAY /dev/md0 level=raid1 num-devices=2
    >>UUID=56e09876:a751356e:b86535d0:95091b5b
    >> devices=/dev/hda1,/dev/hdb1
    >>
    >>As you can see, most of the important directories are mounted in software
    >>RAID1 on the two PATA discs with unimportant stuff on JBOD, although of
    >>course this shouldn't make any difference. All the usual dmesg etc. stuff
    >>doesn't seem to tell me anything I don't already know. If anyone has
    >>experienced this before or has any pointers as to how I can troubleshoot
    >>it, I'd be much obliged!
    >
    >I have had some trouble getting a raid array to inialize on boot in the past.
    >My fix, was to remove its entry from the mdadm.conf file, and re-cfdisk
    >the disks with the auto-detect-raid setting. Then create the raid array
    >and reboot, it came up just fine.
    >Other than that, I'm not sure that else could be wrong.
    >Hopefully someone else on the list has some better ideas.
    >
    >Cheers,
    >Mike

    Thanks for the tip Mika, I have just tried this and a number of other
    configurations, and the RAID array just "dies" (or doesn't initialise) on
    every single reboot, meaning I have to rebuild the array, reformat it, etc
    etc every time - obviously not what I want for a backup server without a
    UPS! I simply don't get it; AFAICT all the modules I need to init a SATA
    RAID1 array at boot exist within the initrd, and they all seem to get
    loaded at the right time (since when modprobe does it's thing later on in
    the boot process I see lots of "loading sata_sil... module already loaded"
    type messages). I'll post the relevant section of dmesg if anyone can spot
    anything I'm not familiar with, other than that I'm going to try building a
    another custom kernel with everything relevant compiled into the kernel
    (already tried one but I must've missed something as it panicked at boot).

    Snipped dmesg follows:

    RAMDISK: cramfs filesystem found at block 0
    RAMDISK: Loading 4716 blocks [1 disk] into ram disk... done.
    VFS: Mounted root (cramfs filesystem) readonly.
    Freeing unused kernel memory: 168k freed
    Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
    ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
    hda: WDC WD2500JB-00EVA0, ATA DISK drive
    hdb: WDC WD2000JB-00GVA0, ATA DISK drive
    hdc: Compaq CRD-8484B, ATAPI CD/DVD-ROM drive
    Using anticipatory io scheduler
    ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
    ide1 at 0x170-0x177,0x376 on irq 15
    AMD7441: IDE controller at PCI slot 0000:00:07.1
    AMD7441: chipset revision 4
    AMD7441: not 100% native mode: will probe irqs later
    AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller
    AMD7441: port 0x01f0 already claimed by ide0
    AMD7441: port 0x0170 already claimed by ide1
    AMD7441: neither IDE port enabled (BIOS)
    SCSI subsystem initialized
    libata version 1.02 loaded.
    device-mapper: 4.1.0-ioctl (2003-12-10) initialised: dm@uk.sistina.com
    sata_sil version 0.54
    ACPI: PCI interrupt 0000:02:05.0[A] -> GSI 17 (level, low) -> IRQ 169
    ata1: SATA max UDMA/100 cmd 0xE0823080 ctl 0xE082308A bmdma 0xE0823000 irq 169
    ata2: SATA max UDMA/100 cmd 0xE08230C0 ctl 0xE08230CA bmdma 0xE0823008 irq 169
    ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
    ata1: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
    ata1: dev 0 configured for UDMA/100
    scsi0 : sata_sil
    ata2: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
    ata2: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
    ata2: dev 0 configured for UDMA/100
    scsi1 : sata_sil
       Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0
       Type: Direct-Access ANSI SCSI revision: 05
    SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
    SCSI device sda: drive cache: write back
      /dev/scsi/host0/bus0/target0/lun0: p1
    Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
       Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0
       Type: Direct-Access ANSI SCSI revision: 05
    SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB)
    SCSI device sdb: drive cache: write back
      /dev/scsi/host1/bus0/target0/lun0: p1
    Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
    md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
    md: raid1 personality registered as nr 3
    cpci_hotplug: CompactPCI Hot Plug Core version: 0.2
    pci_hotplug: PCI Hot Plug PCI Core version: 0.5
    shpchp: HPC vendor_id 1022 device_id 700d ss_vid 0 ss_did 0
    shpchp: shpc_init: cannot reserve MMIO region
    shpchp: HPC vendor_id 1022 device_id 7448 ss_vid 0 ss_did 0
    shpchp: shpc_init: cannot reserve MMIO region
    shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
    pciehp: PCI Express Hot Plug Controller Driver version: 0.4
    vesafb: probe of vesafb0 failed with error -6
    NET: Registered protocol family 1
    hda: max request size: 1024KiB
    hda: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63
      /dev/ide/host0/bus0/target0/lun0: p1 p2 p3 < p5 p6 p7 >
    hdb: max request size: 1024KiB
    hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63
      /dev/ide/host0/bus0/target1/lun0: p1 p2 p3 < p5 p6 p7 p8 p9 > p4
    md: md1 stopped.
    md: bind<hdb2>
    md: bind<hda2>
    raid1: raid set md1 active with 2 out of 2 mirrors
    kjournald starting. Commit interval 5 seconds
    EXT3-fs: mounted filesystem with ordered data mode.
    Adding 1951856k swap on /dev/hdb7. Priority:-1 extents:1
    EXT3 FS on md1, internal journal
    hdc: ATAPI 48X CD-ROM drive, 128kB Cache
    Uniform CD-ROM driver Revision: 3.20
    ieee1394: Initialized config rom entry `ip1394'
    sbp2: $Rev: 1219 $ Ben Collins <bcollins@debian.org>
    ACPI: PCI interrupt 0000:02:06.0[A] -> GSI 18 (level, low) -> IRQ 185
    3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
    0000:02:06.0: 3Com PCI 3c905C Tornado at 0xa400. Vers LK1.1.19
    Capability LSM initialized
    md: md4 stopped.
    md: bind<hdb6>
    md: bind<hda6>
    raid1: raid set md4 active with 2 out of 2 mirrors
    md: md3 stopped.
    md: bind<hdb5>
    md: bind<hda5>
    raid1: raid set md3 active with 2 out of 2 mirrors
    md: md2 stopped.
    md: md0 stopped.
    md: bind<hdb1>
    md: bind<hda1>
    raid1: raid set md0 active with 2 out of 2 mirrors

    As you can see, the only mention of md2 is the "md: md2 stopped" line,
    whereas of course I'd be expecting a "raid1: raid set md2 active with 2 out
    of 2 mirrors" message. Does anyone more au fait with kernel software RAID
    know why the kernel won't even attempt to start md2?

    Should I try a newer kernel? Were there problems with SATA and software
    RAID in 2.6.8? So many questions, and an angry boss!

    P.S. I don't know if it's anything remotely significant, but after setting
    up software RAID on Gentoo I was led to believe that RAID configuration was
    done via the help of /etc/raidtab which the Sarge installer didn't put on
    my machine, so I assumed it wasn't needed and everything was done via
    mdadm.conf; I doubt it'd help my current situation, but would it do any
    harm to put one in there? Gentoo, by default, has an empty mdadm.conf so
    I'm assuming that the two both serve a similar function.

    Yours one very confused Debian user!

    Stephen Tait

    -- 
    To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org 
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
    

  • Next message: Frank Guthorel: "Re: printing to HP"

    Relevant Pages

    • Re: M2N-SLI Deluxe Raid setup questions
      ... I had assumed that since Windows recognized my array at one time that it should work. ... For #4, After I built the RAID array, I had to unplug the IDE drive (the original boot drive) for the RAID array to boot. ... Then I added two WD 250 GB SATA drives and after a little futzing around I got them configured. ...
      (alt.comp.periphs.mainboard.asus)
    • Re: RAID newbie...can I have several partitions on a RAID 1 array?
      ... You haven't expounded upon why you think you need raid. ... better backup device rather than buy 2 cheap RAID HBAs. ... RAID array then I would have to replace the mobo with the same one or at ... Lets say, for example, you buy 2 identical model drives, from ...
      (comp.sys.ibm.pc.hardware.storage)
    • Re: [PATCH 000 of 5] md: Introduction
      ... "why linux raid isn't Raid really, why it can be worse than plain disk") ... After this, the array ... error is in the filesystem, due to the complex layout of raid5. ... hundreds or 1000s of drives, you've quite high probability that some of them will fail sometimes, or will develop a bad sector etc). ...
      (Linux-Kernel)
    • Re: Paul and Old Man: Cannot fix RAID5 failure ...
      ... I removed the drive reported in error, no boot or rebuild. ... I tried Universal Boot Disc but it won't load the RAID driver ... I was considering a parallel WinXP installation on a 4th disk, ... Maybe like in my case, actually 2 drives failed, while the RAID bios ROM ...
      (alt.comp.periphs.mainboard.asus)
    • How I built a 2.8TB RAID storage array
      ... My 2.8TB RAID 5 array is finally up and running. ... Nine 400GB PATA drives; eight for use, ... Two Highpoint RocketRAID 454 cards. ...
      (comp.os.linux.hardware)