Determining where the fault lies when hardware fails?



Running Debian Lenny here and had a failure on my RAID5 array (See my thread
"hdd: Drive not ready for command"). I've got the machine back up and
operating normally. I suspect it may be a software glitch, but I'm not going
to give up that easily.

I've looked through the various files found in /var/log and don't see any
reference to hardware issues.

When the machine failed I was left with the following:

Message from syslog@debian at Wed Feb 6 06:22:3 2008 ...
debian kernel: Disabling IRQ #20
hdd: Drive not ready for command
hdd: Drive not ready for command
hdd: Drive not ready for command
hdd: Drive not rea... etc. ad infinatum

At this point I rebooted, two of my drives came up "non-fresh" and were
kicked from the array:

kicking non-fresh sdd3 from array
unbind <sdd3>
export_rdev(sdd3)
kicking non-fresh sdc3 from array
unbind <sdc3>
export_rdev(sdc3)
...
raid5: device hda3 operational as raid disk0
raid5: device sdb3 operational as raid disk2
raid5: device sda3 operational as raid disk1
raid5: not enough operational devices for md0 (2/5 failed)

.... which I managed to work around by readding them to the array and
rebooting:

mdadm --add /dev/md0 /dev/sdc3 /dev/sdd3 -R
mdadm -w /dev/md0
reboot

,,, and all is well. Data is fine and the machine working normally. Now I
want to see know what failed.

Looking at the above, it seems like both sdc and sdd had issues. IIRC, both
of these drives are connected to the Promise SATA controller of my Asus
P4C800E-Dlx mainboard.

As I said before, I don't see any failure messages anywhere among the
various log files at /var/log...

Can someone shed a bit of light on this issue? Where should I expect to see
messages saved if the hardware was misbehaving?

Thanks!


.



Relevant Pages

  • Re: Drive not ready for command
    ... kicking non-fresh sdd3 from array ... raid5: device hda3 operational as raid disk0 ... How can I "freshen" these drives and get my system operational again? ...
    (alt.os.linux)
  • libata badness
    ... I'm running a raid5 array atop a few sata drives via a promise tx4 ... The kernel is the official fedora lk 2.6.8-1, ... raid5 xor sata_promise md5 ipv6 parport_pc lp parport ...
    (Linux-Kernel)
  • Re: [opensuse] Cannot create raid 5 opensuse 10.3
    ... is a problem with disk 2. ... My next attempt will try to create a raid5 ... I don't trust the installer to make the array. ... the remaining space on the drives I then tried to create a raid 5 array. ...
    (SuSE)
  • Re: Raid5 sanity check
    ... > I have responsibility for a Ibm X360 server with ServeRAID 7k controller. ... > service provider/dealer tells us that we can't simply add drives and extend ... He wants to create another raid5 array in the 3 ...
    (comp.sys.ibm.pc.hardware.storage)
  • Re: raid5
    ... You could also add that spare drive to the existing array and designate it ... as the hot spare so if one of the drives fail the system will automatically ... >> the remaining 2 HDD until I raplace the broken one? ...
    (microsoft.public.windows.server.sbs)