Re: [PATCH 2/4] AFS: Add a function to excise a rejected write from the pagecache



On Thu, 2007-05-24 at 22:35 +0100, David Howells wrote:
Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote:

Could you please flesh this out a bit, from a higher level?

See the description for patch 3/4.

As in: what does it mean for a server to "reject" a write? What's actually
going on here?

Simply, it means that the server refused to perform the write. The example I'm
thinking of is that it refused to do so because the ACL governing that file
changed between the permission() check and the attempt to write the data back.

All it takes is a write-back cache and a tiny little delay between the copy to
page cache and the write back to the server and there's a window in which
another client can leap in and make it impossible for the client to write the
data.

There are other possibilities, for instance:

(1) The key governing the writeback expires.

(2) The user's account is deleted or disabled.

The file being deleted is handled elsewhere. In such a case, everyone's
writes are just discarded.

Why does the server do this?

Because it's told to by some other client. Think chmod, or in AFS's case:

fs setacl -dir . -acl system:administrators rlidka

This removes write access by not including a 'w' amongst 'rlidka'.

(Note in AFS, the ACLs attached to a directory control the files in that
directory).

I assume that the application will see a short write or an EIO or something?

EACCES or similar most likely. In AFS's case you should get an abort with some
appropriate error code. This is then mapped to EACCES, EKEYREVOKED,
EKEYEXPIRED, etc.

How does this interact with MAP_SHARED?

MAP_SHARED/PROT_WRITE is just another way of doing writes to the pagecache. It
isn't really special. The way I've done it is to use page_mkwrite() to track
the key with which a write through a mapping is written back.

Do we expect that any other networked filesystem would want to do this?
(and if not, why not?) Or do they already attempt to do it in some other
fashion?

_Any_ network filesystem that (a) has access controls, and (b) does writeback
caching (be the interval ever so brief) is liable to this issue. It could
possible for a netfs to avoid with this by getting a guarantee that the write
will be accepted upfront (CIFS might do this), or by blocking attempts to
change the access controls through some sort of locking (again CIFS might do
this).

However, looking at NFS, it appears that NFS may be liable to this problem. It
does appear to use writeback caching, though it flushes its writes back fairly
promptly making it quite difficult to test. I talked to Steve Dickson about
this, and I get the impression that NFS will just sit their and keep retrying
the writeback.

No. If the write fails, then NFS will mark the mapping as invalid and
attempt to call invalidate_inode_pages2() at the earliest possible
moment.

I'm adding in a patch to defer marking the page as uptodate until the
write is successful in cases where NFS is writing a pristine page.

As for pages that are already marked as uptodate, well you already have
a race: you have deferred the page write, and so other processes may
already have read the rejected data before you tried to write it out.

Trond

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/


Quantcast