Re: PATCH [PPC64]: dead processes never reaped

From: Linas Vepstas (linas_at_austin.ibm.com)
Date: 04/19/05

  • Next message: Paul Jackson: "Re: [Lse-tech] Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets"
    Date:	Tue, 19 Apr 2005 12:22:31 -0500
    To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    
    

    On Tue, Apr 19, 2005 at 11:07:01AM +1000, Benjamin Herrenschmidt was heard to remark:
    > On Mon, 2005-04-18 at 14:38 -0500, Linas Vepstas wrote:
    > >
    > > Hi,
    > >
    > > The patch below appears to fix a problem where a number of dead processes
    > > linger on the system. On a highly loaded system, dozens of processes
    > > were found stuck in do_exit(), calling thier very last schedule(), and
    > > then being lost forever.
    > >
    > > Processes that are PF_DEAD are cleaned up *after* the context switch,
    > > in a routine called finish_task_switch(task_t *prev). The "prev" gets
    > > the value returned by _switch() in entry.S, but this value comes from
    > >
    > > __switch_to (struct task_struct *prev,
    > > struct task_struct *new)
    > > {
    > > old_thread = &current->thread; ///XXX shouldn't this be prev, not current?
    > > last = _switch(old_thread, new_thread);
    > > return last;
    > > }
    > >
    > > The way I see it, "prev" and "current" are almost always going to be
    > > pointing at the same thing; however, if a "need resched" happens,
    > > or there's a pre-emept or some-such, then prev and current won't be
    > > the same; in which case, finish_task_switch() will end up cleaning
    > > up the old current, instead of prev. This will result in dead processes
    > > hanging around, which will never be scheduled again, and will never
    > > get a chance to have put_task_struct() called on them.
    >
    > Ok, thinking moer about this ... that will need maybe some help from
    > Ingo so I fully understand where schedule's are allowed ... We are
    > basically in the middle of the scheduler here, so I wonder how much of
    > the scheduler itself can be preempted or so ...
    >
    > Basically, under which circumstances can prev and current be different ?

    I remember finding a path through void __sched schedule(void) that
    took a branch through the goto need_resched; that would result in this.
    I takes a bit of mental gymnastics to see how this might happen.

    FWIW, I can send you a debug session showing all cpu's idle and 44 dead
    processes sitting in do_exit(). All but two of these were Java threads,
    so this seems to be some sort of thread-scheduling subtlty. (the two
    that weren't java threads were a find|grep pair that must have gotten
    tangled in.)

    Given that the patch seems to fix the problem, I didn't dig much deeper.

    --linas

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Paul Jackson: "Re: [Lse-tech] Re: [RFC PATCH] Dynamic sched domains aka Isolated cpusets"

    Relevant Pages

    • Re: PATCH [PPC64]: dead processes never reaped
      ... >> The patch below appears to fix a problem where a number of dead processes ... Naively, we could rturn "prev", this would save a few ... conservative with the patch, ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • PATCH [PPC64]: dead processes never reaped
      ... The patch below appears to fix a problem where a number of dead processes ... On a highly loaded system, ... "prev" and "current" are almost always going to be ... This patch fixes this. ...
      (Linux-Kernel)