KT400 chipset loses disk DMA under AGP and network load....

From: Orange Barrel (lucy_at_in-the-sky-with-diamonds.net)
Date: 11/11/04


Date: Thu, 11 Nov 2004 00:55:48 -0600

This is kind of a long story, so please bear with me. The hardware in
question is a Gigabyte 7VAXP motherboard (KT400 chipset) running an Athlon
XP 2700+ with 1.5 GB of RAM. My video card is a Sapphire Radeon 9100 AGP
(4X - 128 MB). The OS is Slackware 10 (vanilla). I'm using Xorg 6.7.0 with
ATI's binary drivers (3.11.1, 3.14.1, and, most recently, 3.14.6).

The problem is this: when running a fairly intensive OpenGL app (say, um,
bzflag) under a significant network load my hard drive loses DMA and the
channel it's on resets. I cannot tie these events to any particular system
activity (except, of course, a disk access). The syslog (for 2.6.9) shows
this:

Nov 10 23:11:38 gigabyte kernel: hda: dma_timer_expiry: dma status == 0x21
Nov 10 23:11:48 gigabyte kernel: hda: DMA timeout error
Nov 10 23:11:48 gigabyte kernel: hda: dma timeout error: status=0x58 { DriveReady SeekComplete DataRequest }
Nov 10 23:11:48 gigabyte kernel:
Nov 10 23:11:48 gigabyte kernel: ide: failed opcode was: unknown
Nov 10 23:11:48 gigabyte kernel: hda: status error: status=0x50 { DriveReady SeekComplete }
Nov 10 23:11:48 gigabyte kernel:
Nov 10 23:11:48 gigabyte kernel: ide: failed opcode was: unknown
Nov 10 23:11:48 gigabyte kernel: hda: no DRQ after issuing WRITE
Nov 10 23:11:48 gigabyte kernel: hda: status timeout: status=0xd0 { Busy }
Nov 10 23:11:48 gigabyte kernel:
Nov 10 23:11:48 gigabyte kernel: ide: failed opcode was: unknown
Nov 10 23:11:48 gigabyte kernel: hda: no DRQ after issuing WRITE
Nov 10 23:13:20 gigabyte kernel: ide0: reset: success

When this is happening the drive activity light stays on until the drive
resets. The drive does not respond until the activity light goes off.

Attempting to re-enable DMA on the drive with hdparm doesn't work. The
first time the drive is used after DMA is turned back on, it resets with
the same log messages. The only way to restore DMA on the drive is to
reboot the computer.

Apart from the timeout and reset messages none of the logs show any other
problems.

Thinking the existing Western Digital drive was bad, I bought a new
Seagate. Same problem. Thinking the onboard IDE controller was bad, I
moved the drive over to the other onboard controller (PDC20276). Same
problem. I was running 2.4.27, so I dropped down to 2.4.26. Same problem.
I then moved to 2.6.9 because I heard it had better KT400 AGP support.
Same problem.

There is some anecdotal evidence that the VT8235 southbridge has problems
handling DMA bursts from busses with differing speeds (i.e., AGP and PCI):

http://linux.derkeiler.com/Mailing-Lists/Kernel/2003-09/7763.html
http://forums.viaarena.com/messageview.cfm?catid=28&threadid=60131&STARTPAGE=1&enterthread=y

I can't reproduce the problem running the X/Xorg open-source DRI drivers.
I suspect those drivers don't generate enough traffic on the AGP bus to
trigger the timeout. I can't reproduce the problem running fgl_glxgears
while simultaneously downloading a series of large files, either.

I'm really at my wit's end on this thing. I have googled it until my
fingers looked like Vienna sausages. So, any advice or assistance would be
appreciated.

If you need any additional info, I'll be more than happy to provide it.

TIA,
OB