Re: Suggestion for PCI IDE controller?

From: Juhan Leemet (juhan_at_logicognosis.com)
Date: 06/28/04


Date: Sun, 27 Jun 2004 22:26:16 -0200

On Sat, 26 Jun 2004 18:48:16 -0600, Steve Wolfe wrote:
>> For situations where you write a lot more, I would recommend RAID1 (aka
>> mirroring). There you only multiply your writes only by 2x (instead of 5x
>> I/O)...

> I took some SCSI drives a while back, and had a very hard time getting
> any significantly better write performance out of RAID 1 than I did from
> RAID 5:
>
> http://www.codon.com/docs/software-RAID.html

Very interesting. I'm also surprised that your RAID5 writing is as fast
the RAID 0+1 and 1+0. Some of your bars show even higher?!? There should
be extra reads/writes in there for the parity information in RAID5. From
my crude anecdotal numbers, I would have expected a 4/5 or so decrease.

I admit that I did my tests a while ago on old used Sun gear that only had
wide 68-pin SCSI and 5 x SEAGATE ST118202LC disks. They would seem to be
similar to your IBM UltraStars? I'm not familiar with the Adaptec 2940
U2W. Is that one channel per board? Two? Transfer rate? So you had 2 disks
on each 2940? I assume you used all 4 disks for all tests?

I had a look at the OP and it wasn't clear to me whether he was
building a brand new system with the latest gear, or implementing on a
more pedestrian platform. I guess my advice pertains to older gear (e.g.
my own or similar uses), and not particularly "tuned", at that.

My measurement was from running a "tar cf -|tar xf -" kind of transfer
from NFS server to local RAID5. It was the RAID5 that limited transfers. I
have not tried any bonnie or bonnie++ tests on it. Maybe I should.

Which Linux O/S and release were you using? Which RAID software?

Thanks for the URL, and the "data points".
 
>> As for network being the bottleneck: probably, if you have very fast
>> disks.
>
> Unless he's got gigabit networking, the network *will* be the bottleneck
> unless he's got extremely slow disks. With 100mbit ethernet only
> handling 10-11 megaBYTES realistic throughput per second on a
> full-duplex, switched connection, it doesn't take much disk at all...

Yes, according to your numbers. In my case, with 100MHz ethernet and a
single SCSI chain on wide SCSI the disk system was the bottleneck.
Obviously, it is platform dependent. Room for tuning also.

BTW, I get suspicious of high I/O numbers. It used to be that people got
fooled by disk I/O tests because they forgot about the disk block buffers
used by *nix. They ended up basically testing memory bandwidth, and not
true disk bandwidth. I don't think that's what you're doing (-s 1024, that's 1GB,
right? vs. 256MB memory). You have to be writing many times your memory size,
to make sure that the (dynamic) disk block buffer cache does not confound
things. Might be fun to try again with -s 2047? Do you get same results?

Also, I wonder about the amount of parallelism in the bonnie tests. If
there are multiples of 3 or 4 (similar to numbers of disks?), I wonder if
you might see resultant artifacts. I think bonnie was originally designed
to test 1 disk on 1 CPU, but I might be wrong. Any extensions to multiple
disk systems and multiple CPUs might progressively add distortions? They
explicitly warn about interpretations of the "efficiency" measure.

Might have been fun to do RAID1 for a comparison, checkpoint. That should
result in exactly the same write performance as "single" (two writes
going in parallel), with 2x read (alternate reads like RAID0?)? Perhaps a
tiny bit more, if there is any intelligent read-ahead going on? Thoughts?

>> ...RAID with corrupted writes is trashed...
>> I don't think you can recover like from a failed disk.
>
> I don't see why it wouldn't be just another inconsistency in the array
> that gets handled in a resync, as far as the md device goes. As for the
> file system, it gets fsck'd, or the journal recovered.

I'll admit that was conjecture on my part, based on the fact that stuff
you're writing is "merged" with previous stuff on the disks. Depending on
where the fault(s) occur, your damage might be variable. I think if you
damage writes going to more than 1 disk, you're (somewhat?) screwed? RAID5
allows for 1 drive to fail, but not multiples. I don't think it can handle
"arbitrary inconsistencies". If only one write to one disk gets trashed,
then I agree fsck/journal will clean it up, but you might lose more than
just the most recent write. You might also lose some of the old data (in
the same RAID block size?) that was "merged". For more than 1 write? I
don't know. Complicated! Too much trouble to try to setup a test scenario.

p.s. Can you boot from your RAID5? or do you boot from normal disk devices
and then later mount your RAID5 file systems?

-- 
Juhan Leemet
Logicognosis, Inc.

Loading