Re: range-based cache flushing (was Re: Linux 2.6.29)
- From: James Bottomley <James.Bottomley@xxxxxxxxxxxxxxxxxxxxx>
- Date: Wed, 01 Apr 2009 00:14:17 +0000
On Mon, 2009-03-30 at 15:05 -0400, Jeff Garzik wrote:
James Bottomley wrote:
On Wed, 2009-03-25 at 16:25 -0400, Ric Wheeler wrote:
Jeff Garzik wrote:
Ric Wheeler wrote:> And, as I am sure that you do know, to add insult
to injury, FLUSH_CACHE
is per device (not file system).
When you issue an fsync() on a disk with multiple partitions, youSCSI'S SYNCHRONIZE CACHE command already accepts an (LBA, length)
will flush the data for all of its partitions from the write cache....
pair. We could make use of that.
And I bet we could convince T13 to add FLUSH CACHE RANGE, if we could
demonstrate clear benefit.
How well supported is this in SCSI? Can we try it out with a commodity
SAS drive?
What do you mean by well supported? The way the SCSI standard is
written, a device can do a complete cache flush when a range flush is
requested and still be fully standards compliant. There's no easy way
to tell if it does a complete cache flush every time other than by
taking the firmware apart (or asking the manufacturer).
Quite true, though wondering aloud...
How difficult would it be to pass the "lower-bound" LBA to SYNCHRONIZE
CACHE, where "lower bound" is defined as the lowest sector in the range
of sectors to be flushed?
Actually, the implementation is designed to allow this. The standard
says if the number of blocks is zero that means flush from the specified
LBA to the end of the device. The sync cache we currently use has LBA 0
and number of blocks zero (which means flush everything).
That seems like a reasonable optimization -- it gives the drive an easy
way to skip sync'ing sectors lower than the lower-bound LBA, if it is
capable. Otherwise, a standards-compliant firmware will behave as you
describe, and do what our code currently expects today -- a full cache
flush.
This seems like a good way to speed up cache flush [on SCSI], while also
perhaps experimenting with a more fine-grained way to pass down write
barriers to the device.
Not a high priority thing overall, but OTOH, consider the case of
placing your journal at the end of the disk. You could then issue a
cache flush with a non-zero starting offset:
SYNCHRONIZE CACHE (max sectors - JOURNAL_SIZE, ~0)
That should be trivial even for dumb disk firmwares to optimize.
We could try it ... I'm still not sure how we'd tell the device is
actually implementing it and not flushing the entire device.
James
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- Re: range-based cache flushing (was Re: Linux 2.6.29)
- From: Jeff Garzik
- Re: range-based cache flushing (was Re: Linux 2.6.29)
- Prev by Date: Re: [PATCH 6/6] drm/i915: Fix lock order reversal in GEM relocation entry copying. -- makes X hang
- Next by Date: Re: [PATCH] Remove struct mm_struct::exe_file et al
- Previous by thread: Re: [PATCH 6/6] drm/i915: Fix lock order reversal in GEM relocation entry copying. -- makes X hang
- Next by thread: Re: range-based cache flushing (was Re: Linux 2.6.29)
- Index(es):
Relevant Pages
|