Re: [BUG] 2.4.x RT signal leak with kupdated (and maybe others)

From: Andrea Arcangeli (andrea_at_suse.de)
Date: 09/30/03

  • Next message: Chris Wright: "Re: Call traces due to lost IRQ"
    Date:	Tue, 30 Sep 2003 20:22:55 +0200
    To: Benjamin Herrenschmidt <benh@kernel.crashing.org>
    
    

    On Tue, Sep 30, 2003 at 07:47:09PM +0200, Benjamin Herrenschmidt wrote:
    > > When I wrote the kupdate code, only the real time signals could be
    > > queued. Now things have changed to carry the siginfo for non-RT too. The
    > > fact we clear the pending by hand is what allows more than a RT signal
    > > to be stacked, we shouldn't clear the bitflag unless we dequeue the
    > > signal too. That's definitely a bug (though a minor one ;)
    >
    > "Minor" but leads to interesting results in the end when coupled
    > with something like noflushd that regulary send those signals ;)
    >
    > Not only we leak them, but we also get nr_queued_signals reaching
    > nr_max_signals. This has the side effect of making do_notify_parent()
    > silently fail when a pthread is dead (libpthread use an RT signal).
    >
    > The end result is that after a few days, a machine running noflushd
    > and thread intensive apps like evolution and gkrellm will have dozens
    > (or even hundreds) of zombies as the child threads are never reclaimed
    > by libpthread "manager" thread since it never gets the signal...

    for noflushd users it's major (for everybody else is minor)

    > Interesting... though hopefully, I didn't see anybody else causing
    > such a constant increase of nr_queued_signals so far on this laptop...

    That's because nobody else sends signals to the daemons I guess. And
    even if they do the daemon won't clear the pending bitflag, so there's
    no risk to queue more than 1 non-RT entry per signal per daemon like it
    happened with kupdate.

    Andrea - If you prefer relying on open source software, check these links:
                rsync.kernel.org::pub/scm/linux/kernel/bkcvs/linux-2.[45]/
                http://www.cobite.com/cvsps/
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Chris Wright: "Re: Call traces due to lost IRQ"

    Relevant Pages

    • Re: determining the freeze point of an already running process
      ... Maybe "pending 101" means something to someone. ... mask of the pending signals. ... stuck in state D. Processes get stuck in state D (and become ... NFS filesystem mounted, and the NFS server providing that filesystem ...
      (comp.os.linux.misc)
    • [PATCH 1/2] fix stop signal race
      ... CPU processes a SIGCONT or SIGKILL. ... checks for this case explicitly by looking for a pending SIGCONT or SIGKILL ... last time all pending stop signals were cleared due to SIGCONT/SIGKILL. ... The processing of stop signals checks the flag after the window where it ...
      (Linux-Kernel)
    • Re: [PATCH] [RFC] fix missed SIGCONT cases
      ... * the SIGCHLD was pending on entry to this kill. ... Remove all stop signals from all queues, ... If there is a handler for SIGCONT, ... Wake up the stopped thread _after_ setting ...
      (Linux-Kernel)
    • Re: [PATCH 1/2] UML - Fix I/O hang
      ... signals from off to on, it needs to see if any interrupts had happened ... The interrupt handlers check signals_enabled - ... then the handler adds a bit to the "pending" bitmask ... interrupt stops the entire I/O system because the next block request ...
      (Linux-Kernel)
    • [PATCH 1/2] UML - Fix I/O hang
      ... signals from off to on, it needs to see if any interrupts had happened ... The interrupt handlers check signals_enabled - ... then the handler adds a bit to the "pending" bitmask ... When the I/O workload is purely disk-based, the loss of a block device ...
      (Linux-Kernel)