Re: Disk I/O problem on Abit AB9 Pro, Core2 system

On Tue, 19 Dec 2006 15:05:31 -0800, sndive wrote:

General Schvantzkoph wrote:
I'm having an odd problem with my new Core2 system. The system has an Abit
AB9 Pro motherboard (P965 northbridge, ICH8R southbridge), two Seagate
320G SATA drives, 4G of DDR2.

Each drive has three partitions, two 8G partitions to hold OSes and SWAP,
and 1 large 278G partition that is one half of a software RAID0 partition.
When I do a Verilog regression, which involves lots of writes, on one of
the small partitions the I/O system hangs. I'm still able to ssh into the
machine and I can do some operations that don't involve disk accesses, for
example ls will work on the part of the file system that's in memory but
will stop when it hits a part of the file system that isn't cached. The
hang happens very quickly. When I do the regression on the RAID0 partition
this doesn't seem to happen. Also I've tried this with partitions on both
disks, they both fail identically, so this isn't the result of a broken
disk or a bad cable.

I've run Memtest86+ and the Seagate disk diagnostics and neither found a
problem, although it should be noted that neither of those tools
understands the 965 chipset. I've also switched the BIOS mode for the Intel
SATA controllers from AHCI back to IDE, that didn't make any difference.

I'm using 64 bit Fedora Core 6 with a kernel. The BIOS is the
latest one that ABit has.

This could be a bad motherboard but I suspect that if it were I would have
seen a problem with either Memtest86+, the Seagate diagnostics, the
install and updates, or when I untared my tools which are multiple
gigabytes in size. However none of those had a problem. My suspicion
is that mixing both RAID and non-RAID partitions in the same partition
table is the source of the problem, but I don't why this should be.

Does anyone have any theories? Is anyone seeing anything similar?

Of the top of my head that sounds like a problem with the ICH8 driver
in the kernel
(or the driver exposes an ICH8 hardware bug).
If you end up getting a real hw raid controller (as a "temporary"
solution) please let us know.
I'd hate to buy more pata drives if I need more space.
Did you post to the kernel mailing list?

I'm seeing some problems with reiserfs in 2.6.18
(and reiserfsck "fixed" the "problem": all the files are in /lost+found
Are you running ext3 or something else? Raw parititions?

I could augment the memtest86 replacement I posted here a while ago
to include hdparm like benchmarking functionality, but destructive
(testing read and write perf).
I suspect kernel developers already have something like that for
testing though.

I doubt it's the drives or the ICH8R controller. I've been running Verilog
regressions for the last 24 hours on the soft RAID0 partition and I
haven't had a problem there, in fact last night I had two regression
streams running so both cores were running flat out for about 8 hours.
When I use one of the small non-RAID partitions I get the hang up in under
a minute so it looks like a real bug in the SATA driver.

I'm using EXT3 and Seagate 320G drives. I'm thinking about getting a third
identical drive to use as the non-RAID drive, if I can run on that drive
where there will be no RAID partitions, then that should confirm that the
problem is mixing RAID and non-RAID on the same drive.

BTW if you have a Memtest86 replacement it would be nice if you made it
available, put it on Sourceforge or something. It looks like Memtest86+ has
been abandoned, there haven't been any updates since 2005.