Re: [PATCH 2/2] ptrace children revamp



On 04/04, Roland McGrath wrote:

Unless I missed something again, this is not 100% right.

Suppose that we are ->real_parent, the child was traced by us, we ignore
SIGCHLD, and we are doing the threaded reparenting. In that case we should
release the child if it is dead, like the current code does.

I tried to construct a test case to demonstrate this behavior on the
existing kernel. I couldn't manage to get it to do this. Are you this
is "like the current code does"? I'll send you my test case attempts
off-list and we can try together to get the right test to show this. If
it turns out that it does not really behave this way now and so my code
is not introducing any regression, then (as usual) I'd rather fix this
separately after the cleanup.

Heh. You are right, the current kernel has the same (minor) bug.
I beleive it was introduced by b2b2cbc4b2a2f389442549399a993a8306420baf.
Before this patch we notified the new parent even if it is from the same
thread group. If SIGCHLD is ignored, do_notify_parent() sets ->exit_signal = -1,
and then forget_original_parent() notices this and adds the child to ptrace_dead.
Now we don't notify the parent, and so the child "hangs".


However, now I think your patch adds a more serious problem, we can really
leak a zombie. Suppose that that we are ->real_parent, the child was traced
by us, it is zombie now, and the child is not a group leader. No matter who
is the new parent, no matter what is the state of SIGCHLD, we must not reparent
the child: nobody can release it except us. It is not traced any longer,
its ->exit_signal == -1, eligible_child() doesn't like this.

No?

Even if we don't ignore SIGCHLD, we should notify our thread-group, but
reparent_thread() skips do_notify_parent() when the new parent is from
is from the same thread-group.

Why do you think this is wrong? The notification is always group-wide
(a SIGCHLD to the group, and wakeup of the group-shared wait_chldexit).

Well, I was thinking about another thread (the new parent) sleeping in
do_wait(__WNOTHREAD)... not sure this really matters, though.

But,

So there already was a notification, and a second one seems wrong to me.

Yes, I missed this. The second signal doesn't look right.

Note also that reparent_thread() has a very old bug. When it sees the
EXIT_ZOMBIE task, it notifies the new parent, but doesn't check if the task
becomes detached because the new parent ignores SIGCHLD. Currently this means
that init must have a handler for SIGCHLD or we leak zombies.

I had noticed that. After the cleanup, we can look into fixing that.
As usual, I am first going for a set of cleanup patches that changes
exactly nothing semantically.

Yes, yes, I agree. I just wanted to be sure you didn't miss this bug.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: using fork and wait reliably w/signals
    ... > I don't know what the book says about signals. ... you should wait for the child. ... > the parent process will exit soon anyway, ... If the system is not POSIX compliant, then you can ignore the SIGCHLD ...
    (comp.os.linux.development.system)
  • Re: [PATCH 2/2] ptrace children revamp
    ... this adds a minor pessimization. ... pointers to parent process, youngest child, younger sibling, ... SIGCHLD, and we are doing the threaded reparenting. ... is from the same thread-group. ...
    (Linux-Kernel)
  • Re: using fork and wait reliably w/signals
    ... > a large buffer while the parent ... I don't know what the book says about signals. ... Must I wait on the child at some point knowing it ... In fact I think a SIGCHLD ...
    (comp.os.linux.development.system)
  • Re: What is the rational for posix2 requierment that system(char*) blocks sigchld?
    ... sigchld is blocked while systemis executing. ... process wanting to wait for child would have to create ... parent invokes system ...
    (Debian-User)
  • Re: 2.6.x Fork Problem?
    ... >> or the parent set up a SIGCHLD handler before the fork. ... >> So, by the time the parent gets the CPU, the child is long gone. ... that we needed to change the signal handler for SIGCHLD. ...
    (Linux-Kernel)