Re: patch: improve generic_file_buffered_write() (2nd try 1/2)



Nick Piggin <nickpiggin@xxxxxxxxxxxx> writes:

On Thursday 06 September 2007 03:41, Bernd Schubert wrote:
Minor nit: when resubmitting a patch, you should include everything
(ie. the full changelog of problem statement and fix description) in a
single mail. It's just a bit easier...

Will do next time.

So I believe the problem is that for a multi-segment iovec, we currently
prepare_write/commit_write once for each segment, right? We do this

It is more complex.

Currently a __grab_cache_page, a_ops->prepare_write,
filemap_copy_from_user[_iovec] and a_ops->commit_write is done
whenever we hit

a) a page boundary
b) a segment boundary

Those two cases don't have to, and from the stats basically never,
coincide. For NFSd this means we do this TWICE per segment and TWICE
per page.

because there is a nasty deadlock in the VM (copy_from_user being
called with a page locked), and copying multiple segs dramatically
increases the chances that one of these copies will cause a page fault
and thus potentially deadlock.

What actually locks the page? Is it __grab_cache_page or
a_ops->prepare_write?

Note that the patch does not change the number of copy_from_user calls
being made nor does it change their arguments. If we need 2 (or more)
segments to fill a page we still do 2 seperate calls to
filemap_copy_from_user_iovec, both only spanning (part of) one
segment.

What the patch changes is the number of copy_from_user calls between
__grab_cache_page and a_ops->commit_write.

Copying a full PAGE_SIZE bytes from multiple segments in one go would
be a further improvement if that is possible.

The fix you have I don't think can work because a filesystem must be
notified of the modification _before_ it has happened. (If I understand
correctly, you are skipping the prepare_write potentially until after
some data is copied?).

Yes. We changed the order of copy_from_user calls and
a_ops->prepare_write by mistake. We will rectify that and do the
prepare_write for the full page (when possible) before copying the
data into the page.

Anyway, there are fixes for this deadlock in Andrew's -mm tree, but
also a workaround for the NFSD problem in git commit 29dbb3fc. Did
you try a later kernel to see if it is fixed there?

Later than 2.6.23-rc5?

Thanks,
Nick

MfG
Goswin
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: patch: improve generic_file_buffered_write() (2nd try 1/2)
    ... a special case for kernel memory ... For NFSd this means we do this TWICE per segment and TWICE ... What the patch changes is the number of copy_from_user calls between ...
    (Linux-Kernel)
  • Re: [PATCH] modpost: detect unterminated device id lists
    ... This patch against 2.6.23-rc6 will cause modpost to fail if any device ... that made some extensive changes to the elf parser (due to the NOBITS ... the .bss segment is initialized by the loader to all ... the above code basically side-steps the incorrect symbol location ...
    (Linux-Kernel)
  • Re: TCP and syncache question
    ... AO>See the attached patch for fixed version of syncache_expand. ... a segment arrives which apparently is not intended for the current connection.' ... and below irs cannot belong to the current connection - those ... says that we must send an ACK. ...
    (freebsd-net)
  • [RFC] [PATCH] generic_file_buffered_write - deadlock on vectored write?
    ... The attached patch contains detailed ... fault_in_pages_readable brings in current segment or its part. ... That causes the deadlock if pagefault is for the same page write goes to: ... char buf0; ...
    (Linux-Kernel)
  • Re: [2.4 PATCH] bugfix: ARP respond on all devices
    ... "Carlos Velasco" writes: ... that not only I see the hidden patch as the ... If I have two network cards but each of them on different ... attached to the network card on the corresponding segment. ...
    (Linux-Kernel)