Screwed up locking?

From: Ricky Beam (jfbeam_at_bluetronic.net)
Date: 09/13/04

  • Next message: Guido Guenther: "Re: [2.6.7-mm1] Firewire sbp2 problem"
    Date:	Mon, 13 Sep 2004 14:45:32 -0400 (EDT)
    To: Linux Kernel Mail List <linux-kernel@vger.kernel.org>
    
    

    Somewhere past 2.6.6 somebody screwed something up big time. It looks like
    *something* is grabbing a kernel lock and never letting go (or being
    prempted.) The system is stable for a while (in this case ~12 hours)
    and then:
    bad: scheduling while atomic!
     [<c02dbc24>] schedule+0xb24/0xb30
     [<c014190e>] __alloc_pages+0x1ce/0x370
     [<c0272527>] sock_aio_write+0x137/0x150
     [<c0222d6d>] n_tty_chars_in_buffer+0x3d/0x80
     [<c02dc13d>] schedule_timeout+0xbd/0xc0
     [<c0221163>] tty_poll+0x83/0xa0
     [<c01704be>] do_select+0x19e/0x2d0
     [<c0170170>] __pollwait+0x0/0xd0
     [<c0170907>] sys_select+0x2e7/0x540
     [<c010536b>] syscall_call+0x7/0xb
    ...
    bad: scheduling while atomic!
     [<c02dbc24>] schedule+0xb24/0xb30
     [<c02db3a1>] schedule+0x2a1/0xb30
     [<c01306c8>] rcu_check_quiescent_state+0x78/0x90
     [<c011166b>] mark_offset_tsc+0x1fb/0x350
     [<c0224c80>] read_chan+0x0/0x820
     [<c02dbc66>] preempt_schedule+0x36/0x60
     [<c0224e0e>] read_chan+0x18e/0x820
     [<c0107ce4>] handle_IRQ_event+0x34/0x70
     [<c011b570>] default_wake_function+0x0/0x20
     [<c021f860>] tty_read+0x0/0x160
     [<c011b570>] default_wake_function+0x0/0x20
     [<c02dc5ef>] _spin_lock+0x6f/0x90
     [<c0224c80>] read_chan+0x0/0x820
     [<c021f99c>] tty_read+0x13c/0x160
     [<c0170865>] sys_select+0x245/0x540
     [<c015c070>] vfs_read+0xe0/0x150
     [<c012b103>] sigprocmask+0x63/0xd0
     [<c015c371>] sys_read+0x51/0x80
     [<c010536b>] syscall_call+0x7/0xb
    bad: scheduling while atomic!
     [<c02dbc24>] schedule+0xb24/0xb30
     [<c011b5d1>] __wake_up_common+0x41/0x60
     [<c011b634>] __wake_up+0x44/0x60
     [<c02dbc66>] preempt_schedule+0x36/0x60
     [<c0226505>] pty_unthrottle+0x45/0x70
     [<c0222c3b>] check_unthrottle+0x3b/0x40
     [<c022518e>] read_chan+0x50e/0x820
     [<c0107ce4>] handle_IRQ_event+0x34/0x70
     [<c011b570>] default_wake_function+0x0/0x20
     [<c021f860>] tty_read+0x0/0x160
     [<c011b570>] default_wake_function+0x0/0x20
     [<c02dc5ef>] _spin_lock+0x6f/0x90
     [<c0224c80>] read_chan+0x0/0x820
     [<c021f99c>] tty_read+0x13c/0x160
     [<c0170865>] sys_select+0x245/0x540
     [<c015c070>] vfs_read+0xe0/0x150
     [<c012b103>] sigprocmask+0x63/0xd0
     [<c015c371>] sys_read+0x51/0x80
     [<c010536b>] syscall_call+0x7/0xb
    ...

    Every keypress in my ssh session leave one of those on the console. When
    the connection closes, the system panics:

    (Here we go again... I thought the interleaved panic BS was fixed.)

    Warning: kfree_skb on hard IRQ c029e511
    Warning: kfree_skb on hard IRQ c029e3cc
    bad: scheduling while atomic!
     [<c02dbc 2s4>ys] _rsecahedd+ul0ex5+01x/0b2x840/
    xb 3[0< 0
    0 10[5<c39021>5c]0 8w9or>]k _rvefss_chreedad+0+x05x/f09x/016x
    50 1
    rn e[<l cp0a1n2ibc10 3-> ]no sti gspyrnocicnmgas:k +A0ixee63,/ k0xidll0i
    g [i<cn0te15rrc3up71t >h]andler! n

    Once the locks are messed up, the system is doomed.

    Version: 2.6.9-SMP-rc1 SMP BK[20040913014622] (preempt enabled)
    System: Dual Athlon MP 1600 (Tyan Tiger MP)

    I'm rebuilding with the current tree (BK[20040913174343]) with preempt turned
    off. If that fails, I'll enable lock debugging.

    --Ricky

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Guido Guenther: "Re: [2.6.7-mm1] Firewire sbp2 problem"

    Relevant Pages

    • cvs commit: src/sys/kern sched_ule.c (fwd)
      ... As much as a 6-7% slowdown on buildworld on dual core machines as compared to 4BSD. ... - Implement one spin lock per thread-queue. ... Parameterize many scheduling decisions via sysctls. ...
      (freebsd-current)
    • Re: Attempted summary of "RT patch acceptance" thread
      ... >>Concerning the QoS, we have been able to obtain hard realtime, at least ... Yes, on IA64, from the hardware IRQ fireing to process scheduling (on ... >>migrated might take a lock, ... which is running can go back to a RT CPU. ...
      (Linux-Kernel)
    • Re: Log flood: "scheduling while atomic" (2.6.15.x, 2.6.16.x)
      ... which lock didn't get unlocked. ... BROKEN removed from the depends for advansys; ... * a patch of my own for usbhid for a slightly weird USB+PS/2 mouse, ... whine if we are scheduling when we should not be. ...
      (Linux-Kernel)
    • Re: [PATCH] sched: fix race in schedule
      ... fix these functions. ... I think the pre_scheduleof rt has chance to drop rq lock. ... The current process of CPU1is scheduling. ... but pick_next_task_fairis called, then the kernel panics. ...
      (Linux-Kernel)
    • Re: [PATCH] fix wait_task_inactive race (was Re: Race condition in ptrace)
      ... The problem is that the runqueue ... unrobustness of non-atomic scheduling on some architectures ... ... runqueue lock might notice this - and wait_task_inactivehas to take ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)