[BUG] 2.4.x RT signal leak with kupdated (and maybe others)

From: Benjamin Herrenschmidt (benh_at_kernel.crashing.org)
Date: 09/30/03

  • Next message: Randy.Dunlap: "Re: make install problems"
    To: Linux Kernel list <linux-kernel@vger.kernel.org>
    Date:	Tue, 30 Sep 2003 18:27:55 +0200
    
    

    Hi !

    I finally figured out why on a friend machine, his nr_queued_signals is
    continuously growing until reaching nr_max_signals, thus preventing
    queuing of RT signals, for example causing do_notify_parent() to fail
    (libpthread uses sig 33 which is RTMIN+1 typically) leading to all sorts
    of zombies floating around etc...

    The problem is a bug in kupdated (possibly shared by other kernel code
    manipulating a task tsk->pending.signal mask "by hand") that gets
    triggered, in this case, by the infamous noflushd, but other culprits
    are possible.

    The bug is simple: the SIGSTOP sent to kupdated gets queued (allocated
    & queued actually) since we try to queue one non-RT signal nowadays.

    However, when "receiving" it, kupdated will "manually" clear it from
    signal pending mask and will _not_ dequeue it. Thus, that signal will
    stay forever in kupdated signal queue, it will never be deallocated and
    nr_queued_signals will never be decreased.

    Actually, further sigstops will stack there as well since kupdated is
    clearing it from tsk->pending.signal so further queuing won't "notice"
    it's already there.
    That clearing also prevents handle_stop_signal() from flushing it from
    the queue when SIGCONT is received.

    The only thing I can see that could get rid of those signals is
    flush_sigqueue(), but of course, this is never called for a kernel
    thread like kupdated.

    So there is a clear bug in kupdated, I suppose the fix is to call
    something like dequeue_signal() from kupdated instead of hacking
    tsk->pending.signal. I need to test a fix before I post a patch.

    Do we have a smiliar bug(s) with other bits of kernel "manipulating"
    the pending signal mask this way ? I don't know what others may do
    here, so if you know something like that, please speak up.

    Ben.

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Randy.Dunlap: "Re: make install problems"

    Relevant Pages

    • Re: Change in SIGCHLD semantics - queued?
      ... There may ever be only zero or one SIGCHLD pending. ... required under the Realtime Signals Extension option. ... to queue non-RT signals. ... Queuing signals requires queues. ...
      (comp.unix.programmer)
    • Re: UNINTERRUPTABLE task state.
      ... > is in UNINTERRUPTIBLE state when it sleeps blocking all signals. ... The kernel performs magic on behalf of processes. ... sends a request to the kernel, and the kernel tries to service the request. ... and parking it in a queue. ...
      (comp.os.linux.misc)
    • Re: [take24 0/6] kevent: Generic event handling mechanism.
      ... know which threads are waiting in the kernel on the event queue. ... How should it know which syscall should be interrupted when special syscall ... Kevent can and works. ... Signals are just usual events. ...
      (Linux-Kernel)
    • [PATCH] generic signal code (small new feature - userspace signal mask), kernel 2.6.16
      ... This is a proposed addition to the linux kernel to reduce the overhead required to mask signals. ... The intended usage is an application with critical sections that need to be guarded against deadlock by preventing signals from being delivered whilst inside one of the critical sections. ... Currently such applications may be very heavy users of the sigprocmask syscall, this proposal provides an additional signal mask stored in userspace that can be updated with a simple store rather than a syscall. ... The *address pointer points to wherever you've decided to keep the userspace sigmask in your thread. ...
      (Linux-Kernel)
    • Re: Simulink circular queue????
      ... should be implemented in a FIFO buffer, but am really struggling how to do this in either native Simulink or Simulink with Embedded Matlab. ... The normal delay discrete delay or Signal PRocessing delay does the function, but with a lot of unwanted throughput time hit. ... The Signal Processing Queue function is mathmatically potent but also seems like the max throughput hit, and is too complex to control for what I need. ... You would need a data input, push/pop control signals etc. ...
      (comp.soft-sys.matlab)