Problems with hard disk (longish)
From: Geoff Winkless (geoff-at-farmline-dot-com_at_[127.0.0.1)
Date: Mon, 16 Feb 2004 10:29:25 -0000
I have as my main machine a dual P-III 450 (BX board) with .5GB RAM, running
Slackware 9.1 and a self-compiled 2.4.24 kernel. I've been using Slackware
as a server system for about 7 years and decided to start using it as my
desktop full-time (or as near as possible).
I recently bought a 123GB Hitachi/IBM Deskstar (8MB cache version) and have
partitioned it partly for the (dual-boot) Win2k (still need feurio, nothing
else is quite good enough), 1GB of swap and the rest is FAT32 so I can write
reliably from both systems.
I also have (as the root drive) a 20GB Seagate (ST320423A, if that matters)
with the Linux partitions.
Originally I had both disks on a Promise UDMA66 controller which seemed fine
except it would regularly produce CRC errors in the syslog: I figured this
was A Bad Thing and looked into it - at first I thought it was cables so I
took out the oh-so-nice round cables and replaced them with traditional flat
(80-wire, of course) ones. This made zero difference. I shortened the cables
in the hope that this would help further but nothing helped.
Things started getting much worse: the machine would hang completely (mouse
not moving in X, ctrl-alt-del not having any effect, caps-lock and
scroll-lock lights locked on (?)).
This happened one time and during the usual fsck (my own stupidity - I
hadn't bothered to upgrade to ext3 as the machine isn't mission-critical or
anything and I'd always been fine with ext2) it would lock up the machine
again (repeatedly, and - it seemed - always at the same point).
I interrupted the fsck and tried dd if=/dev/hdg3 of=/dev/null and the same
In the end I moved the drives off the promise controller and back onto the
mainboard and reinstalled from scratch, and the problem went away, so I
whipped out the controller altogether and replaced it with a (admittedly
fairly cheap) ITEI RAID card (8212).
The same problem still occurs.
If I leave the Deskstar on the mainboard and put the Seagate on the ITEI
card it all works perfectly except - obviously - that the Deskstar's running
at a sub-optimal speed.
I've only had one slight anomaly since I did this: I deleted a large (300MB)
file from the FAT32 partition on the Deskstar (mounted on the onboard,
UDMA33 controller) and the machine had a paddy for a second...
root@gwinkless:/data/mp3/tmp# rm ../1-59840.zip
lc: .: not found
/bin/ls: .: Stale NFS file handle
(lc is a program on the /usr partition, which is on the Seagate drive)
However it then recovered - nothing in the syslog or messages or on the
I think it's pretty unlikely that both the Promise and the ITEI controllers
are faulty in a way that affects -just- the IBM drive and not the Seagate,
and it's a bit of a coincidence that this all started going wrong when I got
the new Deskstar. My guess is now that there's a problem with the drive
itself, perhaps that it's incapable of the higher rates of UDMA (since the
mainboard controller is only UDMA33). The fact that the machine would crash
so fatally is possibly (?) down to the swap partition being on that drive,
although I'm no kernel hacker.
IBM's DFT says that there's nothing wrong with the drive, nothing in the
SMART log or anything. On the other hand it (DFT) won't run the tests with
the drive mounted on the ITEI card. I might try it on the Promise, but if
that still says all's fine I expect I'll have serious trouble getting the
So the question is: is there anything obvious I've missed? And if not, is
this (drive not running on higher UDMA rates) a problem people have come
across with these Deskstars? I had misgivings before buying but had read
enough people state that the problems of 18 months ago were history.
Oh, I'm on Ext3 now, before anyone has a go :)