dying hdd causing MCE and panic (libata)



Hello all!

A SATA drive in one of my servers has made some final steps towards
the grave and it has put out some obvious signs of this onto the
console (ATA transactions failing) but then it has also thrown an
MCE (CPU context corrupt) and then the kernel has panicked.
This server is rock stable otherwise and used to make uptimes
measured in months between planned restarts.

The machine has been removed from power completely and restarted
multiple times but during the boot process it always crashed with
an MCE or a panic or both.

Sorry but I cannot provide exact debug information right now because
I wasn't physically there at the time and I'm still 250kms away from
that server. In fact I've remotely guided two people without a clue
through the phone and they have read things from the console for me,
restarted the machine, etc.

So in the end I told them to open up the server and pull the SATA
cable from that particular drive. Suddenly all the MCEs and panics
had gone away and the machine is running fine since then.

Hardware:

- Nforce4 based motherboard (chipset integrated SATA ports)
- Athlon64 single core CPU
- Diamondmax 9 SATA hard drive

Kernel:

2.6.23-gentoo-r3 (no preempt, no smp)

My questions:

- Is it normal that a simple hard disk failure (that is not even
the system disk) causes MCEs and kernel panics?

- Is this a problem that is induced completely on the hardware
level (eg. the southbridge going crazy and making the whole
hardware platform unstable) or a problem that could be fixed
or handled properly on the software (kernel) level?

Thanks!

Best regards,
Sab
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: dying hdd causing MCE and panic (libata)
    ... MCE and then the kernel has panicked. ... This server is rock stable otherwise and used to make uptimes ... So in the end I told them to open up the server and pull the SATA ...
    (Linux-Kernel)
  • RE: dying hdd causing MCE and panic (libata)
    ... It's unresolved machine check error and don't switch to register mode ... dying hdd causing MCE and panic ... This server is rock stable otherwise and used to make ... So in the end I told them to open up the server and pull the SATA ...
    (Linux-Kernel)
  • Re: Stable on Supermicro server?
    ... and a mail server. ... The SATA RAID controller, although it provides a 'legacy' option ... 2 ports with 2 removable, ...
    (freebsd-stable)
  • Re: optimal hard drive config for MCE 2005?
    ... I have been envisaging a MCE ... Server kind of machine since this product first appeared. ... The introduction of Extenders will do a lot to push this ... >> RAID 0 aka disk striping spreads disk writes across two ...
    (microsoft.public.windows.mediacenter)
  • Re: SYSTEM DISK FREEZES WITH HEAVY IO ON DISK
    ... from PATA to SATA in my home office server with SBS 2003 SP1. ... and installed a single SATA drive on the motherboard SATA ... I now have an Adaptec RAID controller and two SATA drives, ... With servers a re-install is a major task -- and I would assume Microsoft ...
    (microsoft.public.windows.server.sbs)