Re: [PATCH 20/20] honor r/w changes at do_remount() time



On Wed, Jun 28, 2006 at 06:19:35AM +0100, Al Viro wrote:
On Tue, Jun 27, 2006 at 03:14:57PM -0700, Dave Hansen wrote:

Originally from: Herbert Poetzl <herbert@xxxxxxxxxxxx>

This is the core of the read-only bind mount patch set.

Note that this does _not_ add a "ro" option directly to
the bind mount operation. If you require such a mount,
you must first do the bind, then follow it up with a
'mount -o remount,ro' operation.

I guess the fundamental problem I have with that approach is that it's
a cop-out - we just declare rw state of vfsmount independent from that
of filesystem and add a "if a flag is set, act upon vfsmount".

IMHO the read only check has to be done twice, i.e
once for the superblock and a second time for the
particular vfs mount, similar, the procfs mounts
entry shows the combination (logical and) of the
write ability ...

And yes, some of that does make sense. Fine, let's separate that
stuff; but then we'd better decide what rw superblock *is*.

We have a number of vfsmounts over given superblock. OK, some are "we
don't even ask them to be r/w". Some are "we want them r/w, but don't
actually use as such at the moment". Some are "pinned down for write
now". And we do get logics for "can't make it r/o right now".

But look - we have the _same_ logics for superblock itself. Only it's
full of holes. And since you have rw states for those completely
unrelated to those of vfsmount, we get a ridiculous situation - we
*do* mark the moments when superblock becomes impossible to remount
r/o and we even mostly get the moments when it ceases to be busy
writing (unlinked-but-opened files are major exception). But we can't
use that information.

So "can we remount superblock ro?" turns into kinda-sorta duplicate of
the same for vfsmounts, but it's racy as hell and bloody incomplete;
we don't even get "if some vfsmount over it is busy writing, we won't
remount r/o". Approximation is done, but that's it. E.g. mkdir() in
progress does *not* stop remount of superblock r/o (it does prevent
remount of vfsmount with your patchset).

FWIW, I suspect that the root of the problem is that we confuse
different states of filesystem. E.g. one obviously useful feature
would be to have soft r/o - filesystem that is (from the driver POV)
mounted readonly, but would get transparently switched r/w at the
first request. And you have all vfsmount-side infrastructure for that,
BTW. Add something like mechanism we use for expiry and you've got
a very tasty feature for e.g. laptop users: e.g. userland asking to
switch fs soft-ro every 15 minutes and if nobody had wanted it r/w
since the last time, do the transition; if asked r/w again, r/w it
goes on its own. IOW, there's more to it than one bit. And I'm talking
about superblock state...

in what way would that help laptop users?

the only case I see (where it could help) is if the
laptop runs out of battery and the filesystems are
in clean state, they will not ahve to do a filesystem
check when they replaced the battery, what am I
obviously missing here?

TIA,
Herbert

BTW, it might be worth doing the following:
* reintroduce the list of vfsmounts over given superblock
(protected by vfsmount_lock)
* keep ro flag separate from counter and split it in two.
* all decrements are with atomic_dec_and_lock()
* all increments are with atomic_add_unless() + spin_lock() +
check flags + atomic_add_return() + possible spin_unlock
* if writers count goes from non-zero to zero or vice versa
increment/decrement superblock counter (number of
vfsmounts that really want write access).
* make the moments when i_nlink hits 0 bump the superblock
writers count; drop it when such sucker gets freed on final iput.
* kill the sodding "traverse the list of opened files"
logics in remounting superblock r/o. Instead of that,
grab spinlock, check writers count, bail out if non-zero,
grab vfsmount_lock, traverse the list over superblock and
set one of the flags, drop the locks and proceed.
* when remounting superblock r/w, traverse the list and
knock out the same flag.

At least that way we'd get the majority of "can remount ro" logics right...

Another fun issues:
a) MS_REC handling with MS_BIND remounts (trivial)
b) figuring out what (if anything) should be done with
propagation when we have shared subtrees... (not trivial at all)
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH 20/20] honor r/w changes at do_remount() time
    ... This is the core of the read-only bind mount patch set. ... of filesystem and add a "if a flag is set, act upon vfsmount". ... We have a number of vfsmounts over given superblock. ... So "can we remount superblock ro?" ...
    (Linux-Kernel)
  • Re: [PATCH -mm 5/7] add user namespace
    ... for efficiency and other reasons I don't entirely understand), the situation is worse: A mount of server:/foo/bar on / in the bar virtual machine may get its superblock merged with a mount of server:/ foo/baz on / in the baz virtual machine. ... If it's efficient to merge those superblocks we should, and once again it's necessary to tie the UID namespace to the vfsmount, not the superblock. ...
    (Linux-Kernel)
  • Re: [patch 00/10] mount ownership and unprivileged mount syscall (v8)
    ... The remount stuff can wait (especially if there will be a new mount ... amount of vfsmounts marked user when you do not limit the number of vfsmount ... so that unprivileged users cannot create insane ...
    (Linux-Kernel)
  • [PATCH 26/26] NFS: Separate caching by superblock, explicitly if necessary
    ... Separate caching by superblock, explicitly if necessary. ... then the second mount will disable caching and give ... There are three variant NFS mount options that can be added to a mount command ...
    (Linux-Kernel)
  • [PATCH 27/27] NFS: Separate caching by superblock, explicitly if necessary
    ... Separate caching by superblock, explicitly if necessary. ... then the second mount will disable caching and give ... There are three variant NFS mount options that can be added to a mount command ...
    (Linux-Kernel)