hd troubles?

From: Mike Ballard (dont_w_at_nt_spam.org)
Date: 12/22/04

Date: Wed, 22 Dec 2004 15:53:07 GMT


I'm using a year-old+ gentoo install, kernel 2.6.7, udev-030, a 3 year old
ASUS A7M266 with its on-board IDE controller, 768M RAM, 1G swap, "hda: WDC
WD1600JB-00EVA0, ATA DISK drive", "hdb: WDC WD1200JB-00EVA0, ATA DISK
drive" and "root=/dev/hda1 ide0=ata66 ide0=dma ide1=dma" as part of my
grub.conf boot entry. swap and /var/log are on hda and the drive is about
6 months old. There are 20 partitions on hda with 18 mounted and in use.
Every partition is ext3 and I'm using jfsutils 1.1.3. I've been using
this udev setup and compiled kernel for at least six months.

I often leave my machine running for days on end. For the first time that
I'm aware of, this msg was in my term this morning:

    Message from syslogd@localhost at Wed Dec 22 04:19:39 2004 ...
    localhost kernel: journal commit I/O error

And for the second time in about a week I have these msgs in
/var/log/messages (different sectors this time):

    Dec 22 03:35:49 localhost kernel: hda: dma_intr: status=0x51 { DriveReady SeekComplete Error }
    Dec 22 03:35:49 localhost kernel: hda: dma_intr: error=0x40 { UncorrectableError }, LBAsect=2184668, high=0, low=2184668, sector=2184663
    Dec 22 03:35:49 localhost kernel: end_request: I/O error, dev hda, sector 2184663

It repeats about ten times every couple seconds.

And also for the first time, when I started evolution this morning (a mail
front end, using /var/spool/mail), an evolution dialog stating 'read
error, could not open /var/spool/mail/<user>; read-only file system). I
have it set to retrieve from the spool and empty it afterward. As <user> I
can read the file fine - evolution just doesn't seem to be able to
retrieve/clear it.

What I'm wondering is:

- Is this a kernel/latent cfg problem?

- If it's a hd problem does jfs remove the problem sectors from use?

- If not is there some way I can tell the kernel to not use certain
  sectors without having to stop using the entire partition it's in?

- "LBAsect=" isn't always the same as "sector=" - how do I deal with that?

- Does the fdisk sectors' output line up with those reported in the err
  msgs? If so these sectors are on / (hda1, only 26% full)


net [one dot] verizon [cymbal] ballard [no spaces] mike [reverse the whole thing]
  Visit http://www.anysoldier.us/index.cfm