Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls.



On Thu, 19 Jun 2008 12:29:16 +1000
Neil Brown <neilb@xxxxxxx> wrote:

On Wednesday June 18, jlayton@xxxxxxxxxx wrote:

No objection to the patch, but what signal was being sent to nfsd when
you saw this? If it's anything but a SIGKILL, then I wonder if we have
a race that we need to deal with. My understanding is that we have nfsd
flip between 2 sigmasks to prevent anything but a SIGKILL from being
delivered while we're handling the local filesystem operation.

SuSE /etc/init.d/nfsserver does

killproc -n -KILL nfsd

so it looks like a SIGKILL.



From nfsd():

----------[snip]-----------
sigprocmask(SIG_SETMASK, &shutdown_mask, NULL);

/*
* Find a socket with data available and call its
* recvfrom routine.
*/
while ((err = svc_recv(rqstp, 60*60*HZ)) == -EAGAIN)
;
if (err < 0)
break;
update_thread_usage(atomic_read(&nfsd_busy));
atomic_inc(&nfsd_busy);

/* Lock the export hash tables for reading. */
exp_readlock();

/* Process request with signals blocked. */
sigprocmask(SIG_SETMASK, &allowed_mask, NULL);

svc_process(rqstp);

----------[snip]-----------

What happens if this catches a SIGINT after the err<0 check, but before
the mask is set to allowed_mask? Does svc_process() then get called with
a signal pending?

Yes, I suspect it does.

I wonder why we have all this mucking about this signal masks anyway.
Anyone have any ideas about what it actually achieves?


HCH asked me the same question when I did the conversion to kthreads.
My interpretation (based on guesswork here) was that we wanted to
distinguish between SIGKILL and other allowed signals. A SIGKILL is
allowed to interrupt the underlying I/O, but other signals should not.

The question to answer here, I suppose, is whether masking a pending
signal is sufficient to make signal_pending() return false. If I'm
looking correctly then the answer should be "yes". So I don't think we
have a race here after all. I suspect that if SuSE used a different
signal here, that would prevent this from happening. For the record,
both RHEL and Fedora's init scripts use SIGINT for this.

--
Jeff Layton <jlayton@xxxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [PATCH] ptrace/coredump/exit_group deadlock
    ... by segfaulting the task and triggering a core dump while some other ... stopped task can't wakeup because it has SIGNAL_GROUP_EXIT set, ... in which a SIGKILL is left pending before the thread stops. ... be selected from the queue in preference to any stop signals. ...
    (Linux-Kernel)
  • Re: [PATCH - take 2] knfsd: nfsd: Handle ERESTARTSYS from syscalls.
    ... If it's anything but a SIGKILL, then I wonder if we have ... My understanding is that we have nfsd ... allowed to interrupt the underlying I/O, but other signals should not. ... The question to answer here, I suppose, is whether masking a pending ...
    (Linux-Kernel)
  • Re: Signals to cinit
    ... it can mask the next SIGKILL ... Currently, when the global init has the pending SIGKILL, we can't ... We can drop other SIG_DFL signals from the same namespace early as well. ... must know it can't reliably kill cinit with SIGTERM. ...
    (Linux-Kernel)
  • Re: Q: selinux_bprm_committed_creds() && signals/do_wait
    ... Where does a consideration of SIGKILL arise? ... there is no need to flush SIGKILL. ... correct handling of signals by the callee. ... But, if we really need this for security, ...
    (Linux-Kernel)
  • Re: [PATCH] ptrace: simplify ptrace_stop()->sigkill_pending() path
    ... clear was the exact list of problem scenarios you were considering. ... problem scenario is when the SIGKILL was already posted before we took the ... Add a flag meaning "all ... about other fatal signals which cause SIGNAL_GROUP_EXIT? ...
    (Linux-Kernel)