Re: Detecting I/O error and Halting System
- From: Gene Heskett <gene.heskett@xxxxxxxxxxx>
- Date: Tue, 28 Mar 2006 12:55:42 -0500
On Tuesday 28 March 2006 10:07, zine el abidine Hamid wrote:
First of all, thank you for your analysis.
I don't think that it's a HDD problem nor a cable
problem because the servers are new. We have tried
different HDD (seagate, maxtor) but it has not help
anyway.
It's perhaps a temperature problem but we make a lot
tests in hard condition (high temperature)
successfuly...
One thinks that the problem comes from the VIA chipset
VT82c686 (it's also the opinion of *** Johnson
(linux-os) whom advised me to try UDMA33 instead of
UDMA66).
How can I determine the problem?
I want to add that the HDD seems to be disconnected
(the BIOS can't find any drive for boot) after a
simple reset. We must switch off the servers to get
them work again.
However, it takes a long time (4 mounths and more)
before the HDD fell down. I want to work around by
write a module which will supervise the HDD. I know
how to write a module (I used the lkmpg guide
(http://www.tldp.org/LDP/lkmpg/) but how can I
shutdown Linux from inside a module...?
best regards.
Zine.
I take it that you are aware of a drive monitoring utility called
smartd? By querying the drive after a new powerup, you may be able to
extract usefull information about its health.
--- Alan Cox <alan@xxxxxxxxxxxxxxxxxxx> a écrit :
On Llu, 2006-03-27 at 16:55 +0200, zine el abidine
Hamid wrote:
hda: status timeout: status=0xd0 { Busy }
adapter
disque annonce un status busy du DMA
If I'm reading the translation right then your hard
disk decided
it was busy and then never came back
Feb 12 04:46:23 porte_de_clignancourt_nds_b
kernel:
ide0: reset: success
So the IDE layer tried to reset it
Feb 12 10:22:38 porte_de_clignancourt_nds_b
kernel:
hda: timeout waiting for DMA
Which didnt help
Feb 12 10:24:47 porte_de_clignancourt_nds_b
kernel:
ide0: reset: success
Still trying
Feb 12 10:24:47 porte_de_clignancourt_nds_b
kernel:
hda: irq timeout: status=0xd0 { Busy }
Feb 12 10:24:47 porte_de_clignancourt_nds_b
kernel:
hda: DMA disabled
We gave up on DMA to see if PIO would help
Feb 12 10:24:47 porte_de_clignancourt_nds_b
kernel:
ide0: reset timed-out, status=0x80
Feb 12 10:24:47 porte_de_clignancourt_nds_b
kernel:
hda: status timeout: status=0x80 { Busy }
nouvel échec de reset
Feb 12 10:24:47 porte_de_clignancourt_nds_b
kernel:
hda: drive not ready for command
Feb 12 10:24:47 porte_de_clignancourt_nds_b
kernel:
ide0: reset: success
And reset..
Feb 12 13:45:38 porte_de_clignancourt_nds_b
kernel:
hda: status timeout: status=0x80 { Busy }
Feb 12 13:45:38 porte_de_clignancourt_nds_b
kernel:
hda: drive not ready for command
Feb 12 13:45:38 porte_de_clignancourt_nds_b
kernel:
ide0: reset timed-out, status=0x80
Feb 12 13:45:38 porte_de_clignancourt_nds_b
kernel:
end_request: I/O error, dev 03:02 (hda), sector
102263
Feb 12 13:45:38 porte_de_clignancourt_nds_b
syslogd:
/var/log/maillog: Input/output error
Feb 12 13:45:38 porte_de_clignancourt_nds_b
kernel:
end_request: I/O error, dev 03:02 (hda), sector
110720
Feb 12 13:45:38 porte_de_clignancourt_nds_b
kernel:
end_request: I/O error, dev 03:02 (hda), sector
110728
Eventually we give up.
First thing to check would be the disk and the
temperature, then the
cabling. In particular make sure the *long* part of
the cable is between
the drive and the controller.
--
Cheers, Gene
People having trouble with vz bouncing email to me should add the word
'online' between the 'verizon', and the dot which bypasses vz's
stupid bounce rules. I do use spamassassin too. :-)
Yahoo.com and AOL/TW attorneys please note, additions to the above
message by Gene Heskett are:
Copyright 2006 by Maurice Eugene Heskett, all rights reserved.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- Re: Detecting I/O error and Halting System
- From: zine el abidine Hamid
- Re: Detecting I/O error and Halting System
- Prev by Date: Re: [OT] Non-GCC compilers used for linux userspace
- Next by Date: [2.6.15] New ATA error messages on upgrade to 2.6.15
- Previous by thread: Re: Detecting I/O error and Halting System
- Next by thread: 2.6.16-rt10 crash on ppc
- Index(es):