Re: [RFC] fsblock



On Thu, Jun 28, 2007 at 08:35:48AM +1000, David Chinner wrote:
On Wed, Jun 27, 2007 at 07:50:56AM -0400, Chris Mason wrote:
Lets look at a typical example of how IO actually gets done today,
starting with sys_write():

sys_write(file, buffer, 1MB)
for each page:
prepare_write()
allocate contiguous chunks of disk
attach buffers
copy_from_user()
commit_write()
dirty buffers

pdflush:
writepages()
find pages with contiguous chunks of disk
build and submit large bios

So, we replace prepare_write and commit_write with an extent based api,
but we keep the dirty each buffer part. writepages has to turn that
back into extents (bio sized), and the result is completely full of dark
dark corner cases.

That's true but I don't think an extent data structure means we can
become too far divorced from the pagecache or the native block size
-- what will end up happening is that often we'll need "stuff" to map
between all those as well, even if it is only at IO-time.

But the point is taken, and I do believe that at least for APIs, extent
based seems like the best way to go. And that should allow fsblock to
be replaced or augmented in future without _too_ much pain.


Yup - I've been on the painful end of those dark corner cases several
times in the last few months.

It's also worth pointing out that mpage_readpages() already works on
an extent basis - it overloads bufferheads to provide a "map_bh" that
can point to a range of blocks in the same state. The code then iterates
the map_bh range a page at a time building bios (i.e. not even using
buffer heads) from that map......

One issue I have with the current nobh and mpage stuff is that it
requires multiple calls into get_block (first to prepare write, then
to writepage), it doesn't allow filesystems to attach resources
required for writeout at prepare_write time, and it doesn't play nicely
with buffers in general. (not to mention that nobh error handling is
buggy).

I haven't done any mpage-like code for fsblocks yet, but I think they
wouldn't be too much trouble, and wouldn't have any of the above
problems...
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: writepage fs corruption fixes
    ... > Unfortunately it's hard to use mpage_writepageseven in ext3's writeback ... The real problem for ext3 and reiserfs using writepages isn't the ... but the data=ordered buffer based writeback. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: [RFC] fsblock
    ... but we keep the dirty each buffer part. ...
    (Linux-Kernel)