[PATCH] fix bogus ECHILD return from wait* with zombie group leader

From: Roland McGrath (roland_at_redhat.com)
Date: 12/04/04

  • Next message: Matt Mackall: "2.6.9-tiny1 (finally)"
    Date:	Sat, 4 Dec 2004 14:21:40 -0800
    To: Andrew Morton <akpm@osdl.org>, Linus Torvalds <torvalds@osdl.org>
    
    

    Klaus Dittrich observed this bug and posted a test case for it. This patch
    fixes both that failure mode and some others possible. What Klaus saw was
    a false negative (i.e. ECHILD when there was a child) when the group leader
    was a zombie but delayed because other children live; in the test program
    this happens in a race between the two threads dying on a signal. The
    change to the TASK_TRACED case avoids a potential false positive (blocking,
    or WNOHANG returning 0, when there are really no children left), in the
    race condition where my_ptrace_child returns zero.

    Thanks,
    Roland

    Signed-off-by: Roland McGrath <roland@redhat.com>

    --- linux-2.6/kernel/exit.c
    +++ linux-2.6/kernel/exit.c
    @@ -1319,6 +1319,10 @@ static long do_wait(pid_t pid, int optio
     
             add_wait_queue(&current->wait_chldexit,&wait);
     repeat:
    + /*
    + * We will set this flag if we see any child that might later
    + * match our criteria, even if we are not able to reap it yet.
    + */
             flag = 0;
             current->state = TASK_INTERRUPTIBLE;
             read_lock(&tasklist_lock);
    @@ -1337,11 +1341,14 @@ repeat:
     
                             switch (p->state) {
                             case TASK_TRACED:
    - flag = 1;
                                     if (!my_ptrace_child(p))
                                             continue;
                                     /*FALLTHROUGH*/
                             case TASK_STOPPED:
    + /*
    + * It's stopped now, so it might later
    + * continue, exit, or stop again.
    + */
                                     flag = 1;
                                     if (!(options & WUNTRACED) &&
                                         !my_ptrace_child(p))
    @@ -1377,8 +1384,12 @@ repeat:
                                                     goto end;
                                             break;
                                     }
    - flag = 1;
     check_continued:
    + /*
    + * It's running now, so it might later
    + * exit, stop, or stop and then continue.
    + */
    + flag = 1;
                                     if (!unlikely(options & WCONTINUED))
                                             continue;
                                     retval = wait_task_continued(
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Matt Mackall: "2.6.9-tiny1 (finally)"

    Relevant Pages

    • Re: 2.4.23pre6aa1
      ... Talked with Trond and he has other fixes pending... ... > doesn't apply cleanly let me know and I can fix it for you. ... Apart from this there's a huge pile of fixes all over in -aa. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: Multithread select() bug
      ... > Your program is racy and have undefined behavior. ... maintaining state as you go (which is what the 1 second sleep in the ... a race condition arises. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: 2.6.0-test9-mjb1
      ... > scheduler callers profiling ... any e1000 fixes you have, please forward them to me and Intel ... rather than letting them languish in a tree. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Linux 2.6.12.5
      ... The diffstat and short summary of the fixes are below. ... I'll also be replying to this message with a copy of the patch between ... and can be browsed at the normal kernel.org git web browser: ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [PATCH 1/3] copyfile: generic_sendpage
      ... >> sorts of nasty things which unprivileged apps can do to the system by ... >> overloading filesystems. ... but the fixes appear to be hard enough to arrange ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)