Re: NFSv3 lock recovery



On 8/28/07, Neil Brown <neilb@xxxxxxx> wrote:

Brief question about NFSv3 lock recovery to those who might
know - does Linux implementation (or NLM/NSM protocol)
properly support the case in which client and server state
change simultaneously?

If both crash, there is nothing for a client to reclaim and nothing
for a server to discard, so it is hard to see where a problem could
lie.

What I'm getting is a usual stale lock case - node(s) bootup
gets stuck to first attempted file lock (wtmp in this case).


Reason I'm asking is that this very case is occasionally giving
me stale locks. Given that NFSv3 server crashes it's possible
that client 'rooting' from it crashes as well. Now, once the
server comes back up it tries to notify the client that just
crashed. Hardly surprisingly, this notification doesn't go
anywhere and server discards the notification/client. And once
the client starts to boot again it tries to notify the server which
instantly whines about SM_NOTIFY when no-one is being
monitored. Thus, whole notification cycle is busted and lock
states go haywire :/. Is this even supposed to work?


More details. What, exactly, goes "haywire"...

Wtmp is locked and client remains waiting for it to be
released. It's as if server lock state was persistent over
crash and both ways busted notification cycle makes
sure lock is never released. Server never grants a new
lock to this file.

But I take it that lock state(s) are not supposed to be
persistent over crash? Umm.


If your client is diskless and mounting root from the server, then it
should be mounting the root with "-o nolock" and there should be no
locking issues at all.

Sure, but I need the locks..


Please explain in detail your configuration (What is mounted where and
with what options etc) and what happens (why does the client crash
just because the server crashed - it shouldn't), and what actually
fails that you expected to work.

Client has root on top of NFS. Thus, if the server goes
away it's 'quite likely' for things to jam in the client and
in this case we have a HA setup that reacts to this.

One more thing. Restarting server statd once clients
are up helps instantly.


--
// Janne
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: What doesnt lend itself to OO?
    ... >> proxy and instructs the server to constuct the real object. ... rather than client code. ... If 'clock' is instantiated in the server, ... > for the server interface at the OOA level. ...
    (comp.object)
  • This is going straight to the pool room
    ... or not the client has privilege to do what they're trying to do, ... The server environment is this: ... 3GL User action Routines that Tier3 will execute on your behalf during the ... Routine Name: USER_INIT ...
    (comp.os.vms)
  • [Full-Disclosure] R: Full-Disclosure Digest, Vol 3, Issue 42
    ... Full-Disclosure Digest, Vol 3, Issue 42 ... SD Server 4.0.70 Directory Traversal Bug ... Arkeia Network Backup Client Remote Access ...
    (Full-Disclosure)
  • Re: What doesnt lend itself to OO?
    ... > rather than client code. ... no way to do that without also touching the object with clock semantics ... will not encapsulate both clock semantics and network semantics. ... The server can do whatever it wants ...
    (comp.object)
  • RE: Fax monitor incoming + outgoing calls?
    ... problem between the client computer and the SBS server. ... Client is using the internal IP address of the SBS server as the ... To the folder redirection GPO issue: ...
    (microsoft.public.windows.server.sbs)