Repeatable FC3 meltdown after a week... Disk? Heat?
Date: 13 Feb 2005 02:50:37 -0800
Hello all, I've got something I've never seen before. All signs point
to bad hard drives and/or a heat issue, but 2 of the three drives I've
tried are brand new. All drives are from different mfg's, bought at
different times from different vendors. One of the machine's is a
laptop, the other a 4u rack mount case with lots of ventilation. The
machines are in separate environment - both fairly hospitable.
Here's the deal - I've installed FC3, done lots of updates, etc. and
have fully working systems... for about a week and then the disks
appear to crash. FWIW, I'm running 2.6.9-1.724_FC3. The systems stay
running, to a certain extent. For instance you can ssh in, ls and cd
around, top and kill appear to work, but ps doesn't. Doing anything
more gets you an "Input/output error" and that's it. If you try to
shutdown or reboot (using those commands), you get "Bus error" and
nothing more. When you physically reset the box, it fails to boot,
saying it can't find a bootable hard disk. If you leave it off and
flip it back on hours later (sometimes a few days later), it'll
actually boot and you wind up fsck'ing, but it will usually return to
One of the disks, a 3 year old disk, did actually crash I think. I
replaced it and the new one makes a "coughing" or scratching sound
after running for about a week; not a normal sound, I think I'll return
that disk. The odd sound happens for several minutes at about 20
second intervals, then stops, sometimes for hours, and then returns
intermittently. The coughing sound started last night after running
about a week. This morning I noticed the "Input/output error" on the
I thought it was an overheating issue, so I've been watching smartctl
output and checking it's heat information. It's been as high as 75C,
usually when it's been idle awhile, but as soon as I disturb it it
drops back to about 45-50C. When I noticed the error this morning it
was at 45C, so I'm not so sure about the heat problem. The only thing
that points me to a heat issue is the "turn it off for a few
A friend's machine, the 4u unit, did the same thing last night. A
physical reset and fsck took care of it.
Any idea what could cause the "Input/output error"? Why can't I reboot
the machine from the command prompt? I can't even run /bin/sync before
flipping the power, which contributes to my disk corruption...
I've never seen a Linux platform so unstable, and I've been using Linux
for 11-12 years! I'm used to installing Linux and forgetting about it,
not having to reboot it for months. These intallations seem to be
Have I just had an incredible case of coincidental drive failures? What
are the odds of three drives from different mfgs failing at once with
the same exact symptoms (i.e. 1 week old FC3 install)? Astronomical!
Or is there a problem with Fedora and/or the ext3 filesystem?
Thanks for any insights,