Re: Problems with software RAID on SATA
From: Stephen Tait (tait_at_digitallaw.co.uk)
Date: 08/19/05
- Previous message: Joe Mc Cool: "Re: printing to HP"
- In reply to: michael_at_etalon.net: "Re: Problems with software RAID on SATA"
- Next in thread: michael_at_etalon.net: "Re: Problems with software RAID on SATA"
- Reply: michael_at_etalon.net: "Re: Problems with software RAID on SATA"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 19 Aug 2005 15:34:39 +0100 To: debian-user@lists.debian.org
At 16:37 18/08/2005, you wrote:
>Quoting Stephen Tait <tait@digitallaw.co.uk>:
>
>>I'm just in the process of setting up a Sarge server to be used as a sort
>>of backup server. The main PATA discs are used to boot the OS offof
>>software RAID1, with the rest of the disc space used in JBOD for
>>not-so-important backups. However, I'm having problems getting the new
>>disc array up and running.
>>
>>We've put a SATA controller in the box, a cheap-as-chips PCI Adaptec
>>1210SA which, according to lspci, uses the SIlicon Image SI3112 chipset
>>to provide two SATA channels. Connected to this are two 320GB drives
>>which I want to turn into a RAID1 array. When the system booted first, I
>>used mdadm to create the RAID1 array md2 (mdadm --create /dev/md2
>>--level=1 --raid-disks=2 /dev/sda1 /dev/sdb1), checked /proc/mdstat to
>>wait for the array to finish syncing, and then formatted it ext3 and
>>mounted it. Everything seemed to work fine until I rebooted, whereupon
>>the mount failed with the report that it wasn't a valid ext[2|3]
>>superblock; fsck confirmed this and on further inspection it seemed that
>>it wasn't a RAID device any more either.
>>
>>...and booted with that instead after editing GRUB's menu.lst. The exact
>>same error occurred, and I'm now at a bit of a loss to explain what's
>>happening. If I try and mount the discs on their own (i.e. mount /dev/sdX
>>/mnt/somedir) then they work just fine, so the hardware works fine - so
>>I'm almost certain it's a problem with initting the RAID arrays at boot.
>>At the moment I'm just rebuilding the array to see what happens when I
>>don't try and mount it at boot, but only after the OS has finished
>>booting, but of course that'll only be a temporary workaround. If it's
>>any help, here are my fstab and mdadm.conf's:
>>
>>pika@zaphod2:~$ cat /etc/fstab
>># /etc/fstab: static file system information.
>>#
>># <file system> <mount point> <type> <options> <dump> <pass>
>>proc /proc proc defaults 0 0
>>/dev/md1 / ext3 defaults,errors=remount-ro 0 1
>>/dev/md0 /boot ext2 defaults 0 2
>>/dev/hdb9 /home ext3 defaults 0 2
>>/dev/hdb4 /mnt/avj-backup ext3 defaults 0 2
>>/dev/hda7 /mnt/dcj-backup ext3 defaults 0 2
>>/dev/hdb8 /tmp ext3 defaults 0 2
>>/dev/md4 /usr ext3 defaults 0 2
>>/dev/md3 /var ext3 defaults 0 2
>>/dev/hdb7 none swap sw 0 0
>>/dev/hdc /media/cdrom0 iso9660 ro,user,noauto 0 0
>>#/dev/md2 /mnt/dcj-archive ext3 defaults 0 2
>>
>>>===============================================
>>
>>pika@zaphod2:~$ cat /etc/mdadm/mdadm.conf
>>DEVICE partitions
>>ARRAY /dev/md4 level=raid1 num-devices=2
>>UUID=b8093124:a6d6f876:a29eecb7:e1b332f3
>> devices=/dev/hda6,/dev/hdb6
>>ARRAY /dev/md3 level=raid1 num-devices=2
>>UUID=1973b0c3:e38869d2:ffef0cde:92048042
>> devices=/dev/hda5,/dev/hdb5
>>ARRAY /dev/md2 level=raid1 num-devices=2
>>UUID=78a3be5a:f0838fe2:4d4ce7ed:3a969954
>> devices=/dev/sda1,/dev/sdb1
>>ARRAY /dev/md1 level=raid1 num-devices=2
>>UUID=51d55d28:3e653dce:631dd682:8dd52a37
>> devices=/dev/hda2,/dev/hdb2
>>ARRAY /dev/md0 level=raid1 num-devices=2
>>UUID=56e09876:a751356e:b86535d0:95091b5b
>> devices=/dev/hda1,/dev/hdb1
>>
>>As you can see, most of the important directories are mounted in software
>>RAID1 on the two PATA discs with unimportant stuff on JBOD, although of
>>course this shouldn't make any difference. All the usual dmesg etc. stuff
>>doesn't seem to tell me anything I don't already know. If anyone has
>>experienced this before or has any pointers as to how I can troubleshoot
>>it, I'd be much obliged!
>
>I have had some trouble getting a raid array to inialize on boot in the past.
>My fix, was to remove its entry from the mdadm.conf file, and re-cfdisk
>the disks with the auto-detect-raid setting. Then create the raid array
>and reboot, it came up just fine.
>Other than that, I'm not sure that else could be wrong.
>Hopefully someone else on the list has some better ideas.
>
>Cheers,
>Mike
Thanks for the tip Mika, I have just tried this and a number of other
configurations, and the RAID array just "dies" (or doesn't initialise) on
every single reboot, meaning I have to rebuild the array, reformat it, etc
etc every time - obviously not what I want for a backup server without a
UPS! I simply don't get it; AFAICT all the modules I need to init a SATA
RAID1 array at boot exist within the initrd, and they all seem to get
loaded at the right time (since when modprobe does it's thing later on in
the boot process I see lots of "loading sata_sil... module already loaded"
type messages). I'll post the relevant section of dmesg if anyone can spot
anything I'm not familiar with, other than that I'm going to try building a
another custom kernel with everything relevant compiled into the kernel
(already tried one but I must've missed something as it panicked at boot).
Snipped dmesg follows:
RAMDISK: cramfs filesystem found at block 0
RAMDISK: Loading 4716 blocks [1 disk] into ram disk... done.
VFS: Mounted root (cramfs filesystem) readonly.
Freeing unused kernel memory: 168k freed
Uniform Multi-Platform E-IDE driver Revision: 7.00alpha2
ide: Assuming 33MHz system bus speed for PIO modes; override with idebus=xx
hda: WDC WD2500JB-00EVA0, ATA DISK drive
hdb: WDC WD2000JB-00GVA0, ATA DISK drive
hdc: Compaq CRD-8484B, ATAPI CD/DVD-ROM drive
Using anticipatory io scheduler
ide0 at 0x1f0-0x1f7,0x3f6 on irq 14
ide1 at 0x170-0x177,0x376 on irq 15
AMD7441: IDE controller at PCI slot 0000:00:07.1
AMD7441: chipset revision 4
AMD7441: not 100% native mode: will probe irqs later
AMD7441: 0000:00:07.1 (rev 04) UDMA100 controller
AMD7441: port 0x01f0 already claimed by ide0
AMD7441: port 0x0170 already claimed by ide1
AMD7441: neither IDE port enabled (BIOS)
SCSI subsystem initialized
libata version 1.02 loaded.
device-mapper: 4.1.0-ioctl (2003-12-10) initialised: dm@uk.sistina.com
sata_sil version 0.54
ACPI: PCI interrupt 0000:02:05.0[A] -> GSI 17 (level, low) -> IRQ 169
ata1: SATA max UDMA/100 cmd 0xE0823080 ctl 0xE082308A bmdma 0xE0823000 irq 169
ata2: SATA max UDMA/100 cmd 0xE08230C0 ctl 0xE08230CA bmdma 0xE0823008 irq 169
ata1: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
ata1: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
ata1: dev 0 configured for UDMA/100
scsi0 : sata_sil
ata2: dev 0 cfg 49:2f00 82:346b 83:7f01 84:4003 85:3469 86:3c01 87:4003 88:203f
ata2: dev 0 ATA, max UDMA/100, 625142448 sectors: lba48
ata2: dev 0 configured for UDMA/100
scsi1 : sata_sil
Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sda: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sda: drive cache: write back
/dev/scsi/host0/bus0/target0/lun0: p1
Attached scsi disk sda at scsi0, channel 0, id 0, lun 0
Vendor: ATA Model: WDC WD3200JD-00K Rev: 08.0
Type: Direct-Access ANSI SCSI revision: 05
SCSI device sdb: 625142448 512-byte hdwr sectors (320073 MB)
SCSI device sdb: drive cache: write back
/dev/scsi/host1/bus0/target0/lun0: p1
Attached scsi disk sdb at scsi1, channel 0, id 0, lun 0
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: raid1 personality registered as nr 3
cpci_hotplug: CompactPCI Hot Plug Core version: 0.2
pci_hotplug: PCI Hot Plug PCI Core version: 0.5
shpchp: HPC vendor_id 1022 device_id 700d ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: HPC vendor_id 1022 device_id 7448 ss_vid 0 ss_did 0
shpchp: shpc_init: cannot reserve MMIO region
shpchp: Standard Hot Plug PCI Controller Driver version: 0.4
pciehp: PCI Express Hot Plug Controller Driver version: 0.4
vesafb: probe of vesafb0 failed with error -6
NET: Registered protocol family 1
hda: max request size: 1024KiB
hda: 488397168 sectors (250059 MB) w/8192KiB Cache, CHS=30401/255/63
/dev/ide/host0/bus0/target0/lun0: p1 p2 p3 < p5 p6 p7 >
hdb: max request size: 1024KiB
hdb: 390721968 sectors (200049 MB) w/8192KiB Cache, CHS=24321/255/63
/dev/ide/host0/bus0/target1/lun0: p1 p2 p3 < p5 p6 p7 p8 p9 > p4
md: md1 stopped.
md: bind<hdb2>
md: bind<hda2>
raid1: raid set md1 active with 2 out of 2 mirrors
kjournald starting. Commit interval 5 seconds
EXT3-fs: mounted filesystem with ordered data mode.
Adding 1951856k swap on /dev/hdb7. Priority:-1 extents:1
EXT3 FS on md1, internal journal
hdc: ATAPI 48X CD-ROM drive, 128kB Cache
Uniform CD-ROM driver Revision: 3.20
ieee1394: Initialized config rom entry `ip1394'
sbp2: $Rev: 1219 $ Ben Collins <bcollins@debian.org>
ACPI: PCI interrupt 0000:02:06.0[A] -> GSI 18 (level, low) -> IRQ 185
3c59x: Donald Becker and others. www.scyld.com/network/vortex.html
0000:02:06.0: 3Com PCI 3c905C Tornado at 0xa400. Vers LK1.1.19
Capability LSM initialized
md: md4 stopped.
md: bind<hdb6>
md: bind<hda6>
raid1: raid set md4 active with 2 out of 2 mirrors
md: md3 stopped.
md: bind<hdb5>
md: bind<hda5>
raid1: raid set md3 active with 2 out of 2 mirrors
md: md2 stopped.
md: md0 stopped.
md: bind<hdb1>
md: bind<hda1>
raid1: raid set md0 active with 2 out of 2 mirrors
As you can see, the only mention of md2 is the "md: md2 stopped" line,
whereas of course I'd be expecting a "raid1: raid set md2 active with 2 out
of 2 mirrors" message. Does anyone more au fait with kernel software RAID
know why the kernel won't even attempt to start md2?
Should I try a newer kernel? Were there problems with SATA and software
RAID in 2.6.8? So many questions, and an angry boss!
P.S. I don't know if it's anything remotely significant, but after setting
up software RAID on Gentoo I was led to believe that RAID configuration was
done via the help of /etc/raidtab which the Sarge installer didn't put on
my machine, so I assumed it wasn't needed and everything was done via
mdadm.conf; I doubt it'd help my current situation, but would it do any
harm to put one in there? Gentoo, by default, has an empty mdadm.conf so
I'm assuming that the two both serve a similar function.
Yours one very confused Debian user!
Stephen Tait
-- To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
- Previous message: Joe Mc Cool: "Re: printing to HP"
- In reply to: michael_at_etalon.net: "Re: Problems with software RAID on SATA"
- Next in thread: michael_at_etalon.net: "Re: Problems with software RAID on SATA"
- Reply: michael_at_etalon.net: "Re: Problems with software RAID on SATA"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|