Re: nfs: infinite loop in fcntl(F_SETLKW)




On Thu, 2008-04-10 at 21:51 +0200, Miklos Szeredi wrote:
Another infinite loop, this one involving both client and server.

Basically what happens is that on the server nlm_fopen() calls
nfsd_open() which returns -EACCES, to which nlm_fopen() returns
NLM_LCK_DENIED.

On the client this will turn into a -EAGAIN (nlm_stat_to_errno()),
which in will cause fcntl_setlk() to retry forever.

I _think_ the solution is to turn NLM_LCK_DENIED into ENOLCK for
blocking locks, as NLM_LCK_BLOCKED is for the contended case. For
testing the lock leave NLM_LCK_DENIED as EAGAIN. That still could be
misleading, but at least there's no infinite loop in that case.

I've minimally tested this patch to verify that it cures the lockup,
and that simple blocking locks keep working.

Signed-off-by: Miklos Szeredi <mszeredi@xxxxxxx>
---
fs/lockd/clntproc.c | 3 +++
1 file changed, 3 insertions(+)

Index: linux/fs/lockd/clntproc.c
===================================================================
--- linux.orig/fs/lockd/clntproc.c 2008-04-02 13:34:57.000000000 +0200
+++ linux/fs/lockd/clntproc.c 2008-04-10 21:23:46.000000000 +0200
@@ -536,6 +536,9 @@ again:
up_read(&host->h_rwsem);
}
status = nlm_stat_to_errno(resp->status);
+ /* Don't return EAGAIN, as that would make fcntl_setlk() loop */
+ if (status == -EAGAIN)
+ status = -ENOLCK;
out_unblock:
nlmclnt_finish_block(block);
/* Cancel the blocked request if it is still pending */


Wait. There is something really weird going on here.

According to the spec, LCK_DENIED means 'the request failed' (i.e.
ENOLCK is definitely correct)

OTOH, LCK_DENIED_NOLOCKS and LCK_DENIED_GRACE_PERIOD are both temporary
failures, the first because the server had a resource problem, and the
second because the server rebooted and is in the grace period (i.e.
EAGAIN would appear to be more appropriate). See

http://www.opengroup.org/onlinepubs/9629799/chap10.htm#tagcjh_11_02_02_02

AFAICS, the correct thing to do is to fix nlm_stat_to_errno() by
swapping the return values for NLM_LCK_DENIED and
NLM_LCK_DENIED_NOLOCKS/NLM_LCK_DENIED_GRACE_PERIOD.

The problem is that there appears to be a similar confusion on the Linux
server side in nlmsvc_lock(). :-(

--
Trond Myklebust
Linux NFS client maintainer

NetApp
Trond.Myklebust@xxxxxxxxxx
www.netapp.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: nfs: infinite loop in fcntl(F_SETLKW)
    ... Basically what happens is that on the server nlm_fopencalls ... blocking locks, as NLM_LCK_BLOCKED is for the contended case. ... EAGAIN would appear to be more appropriate). ... filesystem has a ->lockmethod. ...
    (Linux-Kernel)
  • Re: nfs: infinite loop in fcntl(F_SETLKW)
    ... Basically what happens is that on the server nlm_fopencalls ... blocking locks, as NLM_LCK_BLOCKED is for the contended case. ... EAGAIN would appear to be more appropriate). ... Linux NFS client maintainer ...
    (Linux-Kernel)
  • nfs: infinite loop in fcntl(F_SETLKW)
    ... Another infinite loop, this one involving both client and server. ... blocking locks, as NLM_LCK_BLOCKED is for the contended case. ...
    (Linux-Kernel)
  • Re: [RFC][Resend] Make NFS-Client readahead tunable
    ... Make NFS-Client readahead tunable ... NFS server when the underlying filesystem is of type SAM-FS. ... infrequently used data to "offline" media like e.g. tape. ... the Linux NFS client causes the problem. ...
    (Linux-Kernel)
  • Re: IIS6.0 - Error 500.13 Web server is busy
    ... resource problem. ... I have a Windows 2003 smallbusiness server ... or restart IIS. ... Outlook Web Access for PDA, ...
    (microsoft.public.exchange.admin)

Loading