Re: weird mdadm crash
- From: Andrew Sackville-West <andrew@xxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 8 Mar 2007 10:04:29 -0800
On Thu, Mar 08, 2007 at 08:30:57AM -0800, michael wrote:
Hello,
Have an etch box that does nothing but rsync data with another.
About every other day or so, the box will completely freeze.
Everything, screen blank, no keyboard, and the hard drive light
is on solid.
I can hard reboot it and it comes up, and there is nothing in the logs that
suggest anything.
The root system is an mdadm raid 5 array, and everytime I reboot
it from a crash, the array is always degraded. It auto rebuilds itself,
and away it goes again. A few days later, it will lock up.
I have no idea where to start looking for problems. I'm pretty sure its gotta
be hardware, but not sure where to look first.
Any suggestions would be great!
AIUI, the order of mostly likely-to-least likely failure is:
power-supply, hard-drives, memory, other stuff.
power-supplies are hard to test without equipment, unless you know
you've got sensors set up properly. But, its still worth a shot -- set
up lmsensors and look at your voltages. If they're more than +/- 5%
from spec then start with a new power-supply. Hard-drives should
generally leave some kind of logs right before they go down, and with
raid, you shouldn't see a lock-up, unless you're sharing controllers,
maybe. If the drives are SMART enabled, then check that out. I think
memory errors are pretty much impossible to diagnose through any
method other than swapping sticks in a systematic way.
good luck
A
Attachment:
signature.asc
Description: Digital signature
- References:
- weird mdadm crash
- From: michael
- weird mdadm crash
- Prev by Date: Re: a dumb query? pls humor me
- Next by Date: Re: a dumb query? pls humor me
- Previous by thread: weird mdadm crash
- Next by thread: Debian Installer on DELL PowerEdge 850/860
- Index(es):