Re: [PATCH] barrier patch set

From: Chris Mason (mason_at_suse.com)
Date: 03/31/04

  • Next message: Justin T. Gibbs: "Re: "Enhanced" MD code avaible for review"
    To: "Stephen C. Tweedie" <sct@redhat.com>
    Date:	Tue, 30 Mar 2004 17:13:17 -0500
    
    

    On Tue, 2004-03-30 at 16:50, Stephen C. Tweedie wrote:
    > Hi,
    >
    > On Tue, 2004-03-30 at 20:19, Chris Mason wrote:
    >
    >
    > > I think we're mixing a few concepts together. submit_bh(WRITE_BARRIER,
    > > bh) gives us an ordered write in whatever form the lower layers can
    > > provide. It also ensures that if you happen to call wait_on_buffer()
    > > for the barrier buffer, the wait won't return until the data is on
    > > media.
    >
    > Right, but that's just how it works right now --- one doesn't _have_ to
    > imply the other. You could easily imagine an implementation that
    > implements barriers and flushing separately, and which does not do
    > automatic flushing on completion of WRITE_BARRIER IOs. SCSI with
    > writeback caching enabled might be one example of that. NBD/DRBD would
    > be another likely candidate --- if you've got network latencies in the
    > way, then a flushing sync may be far more expensive than a barrier
    > propagation.
    >
    Yes, that's true, although the barriers don't really imply a flush, it
    just implies that if you do use wait_on_* for flushing, it will report
    things accurately.

    > Unfortunately, a lot of the cases we care about really have to do the
    > barrier via flushing, so the benefit of keeping them separate is
    > limited. For LVM/raid0, for example, we've got no way of preserving
    > ordering between IOs on different drives, so a flush is necessary there
    > unless we start journaling the low-level IOs to preserve order.
    >
    Right.

    > Yep. It scares me to think what performance characteristics we'll start
    > seeing once that gets used everywhere it's needed, though. If every raw
    > or O_DIRECT write needs a flush after it, databases are going to become
    > very sensitive to flush performance. I guess disabling the flushing and
    > using disks which tell the truth about data hitting the platter is the
    > sane answer there.

    Most database benchmarks are done on scsi, and the blkdev_flush should
    be a noop there. For IDE based database and mail server benchmarks, the
    results won't be pretty.

    The reiserfs fsync code tries hard to only flush once, so if a commit is
    done then blkdev_flush isn't called. We might have to do a few other
    tricks to queue up multiple synchronous ios and only flush once.

    -chris

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Justin T. Gibbs: "Re: "Enhanced" MD code avaible for review"

    Relevant Pages

    • Re: [PATCH] barrier patch set
      ... > bh) gives us an ordered write in whatever form the lower layers can ... > for the barrier buffer, the wait won't return until the data is on ... automatic flushing on completion of WRITE_BARRIER IOs. ... ordering between IOs on different drives, so a flush is necessary there ...
      (Linux-Kernel)
    • Proposal for "proper" durable fsync() and fdatasync()
      ... This is a proposal to add "proper" durable fsync() and fdatasyncto Linux. ... flush cache commands. ... That's a filesystem bug IMO. ... which _does_ issue a write barrier. ...
      (Linux-Kernel)
    • Re: ide errors in 7-rc1-mm1 and later
      ... > I tried to redo IDE part but discovered nasty design problem, ... > Is sufficient because you can failed sector number and see if it belongs ... > we'll ACK some bios to higher layers before doing flush. ... but obviously not sending a barrier bio down the pipe since ...
      (Linux-Kernel)
    • Re: ide errors in 7-rc1-mm1 and later
      ... I tried to redo IDE part but discovered nasty design problem, ... > flushes to provide barriers, and to that I can only say tough shit. ... The pre/post flush approach has worked successfully, ... Please note that barrier patches are a new feature not a bugfix as ...
      (Linux-Kernel)
    • Re: [PATCH 0/4] (RESEND) ext3[34] barrier changes
      ... cause the metadata blocks to be written until we have to do a journal ... the initial barrier implementations, and they were able to trigger ... All of our data file systems were reiserfs, some of the system partitions were ext2. ...
      (Linux-Kernel)