Re: libata in 2.4.24?
From: bill davidsen (davidsen_at_tmr.com)
To: firstname.lastname@example.org Date: 2 Dec 2003 22:34:20 GMT
In article <email@example.com>,
Greg Stark <firstname.lastname@example.org> wrote:
| Jeff Garzik <email@example.com> writes:
| > > This doesn't happen with SCSI disks where multiple requests can be pending so
| > > there's no urgency to reporting a false success. The request doesn't complete
| > > until the write hits disk. As a result SCSI disks are reliable for database
| > > operation and IDE disks aren't unless write caching is disabled.
| > This is not really true.
| > Regardless of TCQ, if the OS driver has not issued a FLUSH CACHE (IDE)
| > or SYNCHRONIZE CACHE (SCSI), then the data is not guaranteed to be on
| > the disk media. Plain and simple.
| That doesn't agree with people's experience. People seem to find that SCSI
| drives never cache writes. This sort of makes sense since there's just not
| much reason to report a write success before the write can be performed.
| There's no performance advantage as long as more requests can be queued up.
I hope you mean the drives don't report completion until the data is on
the platter, clearly the data is cached in the drive until it can be
| > If fsync(2) returns without a flush-cache, then your data is not
| > guaranteed to be on the disk. And as you noted, flush-cache destroys
| > performance.
| It's my understanding that it doesn't. There was some discussion in the past
| month about making the drivers issue syncs for journalled filesystems, but
| even then the idea of adding it to fsync or O_SYNC files wasn't the
With O_SYNC files there is the possibility of having a don't cache bit
in the packet to the drive, even with write caching. With fsync I don't
see any way to do it after the fact for only some of the data in the
drive cache. That's just an observation.
Clearly with a completion status coming back after actual completion
O_SYNC or fsync reduce to "wait for the ack from the drive."
| > There are three levels:
| > a) Data is successfully transferred to the controller/drive queue (TCQ).
| > b) Data is successfully transferred to the drive's internal buffers.
| > c) The drive successfully transfers data to the media.
| Only the third is of interest to Postgres or other databases. In fact, I
| suspect only the third is of interest to other systems that are supposed to be
| reliable like MTAs etc. I think Wietse and others would be shocked if they
| were told fsync wasn't guaranteed to have waited until the writes had actually
| hit the media.
I think for reliability fsync has to flush cache, regardless of the
performance hit. I think a drive would be unusably slow if you did it
after each O_SYNC write, so that's probably not practical. Clearly the
best solution is a full SCSI implementation over PATA/SATA, but that
would eliminate some of the justification for SCSI devices at premium
-- bill davidsen <firstname.lastname@example.org> CTO, TMR Associates, Inc Doing interesting things with little computers since 1979. - To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to email@example.com More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/