2.4.26 SMP Lockup

From: Jeff Coffin (linux-kernel_at_browndog.org)
Date: 05/30/04

  • Next message: Marcelo Tosatti: "Re: oops, 2.4.26 and jfs"
    To: linux-kernel@vger.kernel.org
    Date:	Sun, 30 May 2004 10:38:03 -0700
    
    

    Hi -

    I've been getting lockups at random times for a while on this
    hardware:

    supermicro P3TDDE MB
    2 x P3s 1.13 GHz
    1G ECC SDRAM (HIGHMEM is enabled)

    It's a hard lockup that doesn't respond to SysRq. This is the first
    time I've caught anything with the serial console. Once before, I got
    the beginnings of an oops, but not enough to be useful. Sometimes it
    happens a couple of times in a day, sometimes it will stay up for a
    month or more. I haven't found any way to force it yet. The system
    is a workstation, all filesystems are XFS FWIW.

    The tainting is from the Cisco vpnclient (version 4.0.1) which I use
    for work. I'll try and get one without that loaded as well since I
    know that usually means all bets are off. I'll also try and get SysRq
    P and T next time it locks, but usually SysRq doesn't appear to work.

    Anyhow, here's the oops run through ksymoops:

    --jeff

    ksymoops 2.4.9 on i686 2.4.26. Options used
         -V (default)
         -k /proc/ksyms (default)
         -l /proc/modules (default)
         -o /lib/modules/2.4.26/ (default)
         -m /usr/src/linux/System.map (default)

    Warning: You did not tell me where to find symbol information. I will
    assume that the log matches the kernel and modules that are running
    right now and I'll use the default options above for symbol resolution.
    If the current kernel and/or modules do not match the log, you can get
    more accurate output by telling me the kernel version and where to find
    map, modules, ksyms etc. ksymoops -h explains the options.

    Unable to<1>Unable to handle kernel paging request at virtual address
    fce6f1d4
    c010a170
    *pde = 00000000
    Oops: 0002
    EIP: 0010:[<c010a170>] Tainted: PF
    Using defaults from ksymoops -t elf32-i386 -a i386
    EFLAGS: 00010006
    eax: 3cad8800 ebx: 00000030 ecx: 00005575 edx: e000dd8c
    esi: e000dd8c edi: c0393efe ebp: e000dd7c esp: e000dd6c
    ds: 0018 es: 0018 ss: 0018
    Process (pid: 1447181, stackpage=e000d000)
    Stack: 61000000 060000fe 00000005 c03a7ce0 e000ddc4 c010966a e000dd8c 00000000
           00000005 00005575 000003fd c03a7ce0 c0393efe e000ddc4 c03a7c00 00000018
           00000018 c03a7c00 c022eb3a 00000010 00000002 000f4064 e000ddf0 c02346e0
    Call Trace: [<c010966a>] [<c022eb3a>] [<c02346e0>] [<c011f0b0>] [<c011f195>] [<c011f4e3>] [<c011f401>] [<c011a487>] [<c011a230>] [<c01095e0>] [<c011a285>]
    Code: ff 80 d4 69 39 c0 f6 c3 c0 75 35 8b 0d 44 4d 32 c0 85 c9 74

    >>EIP; c010a170 <do_nmi+20/b0> <=====

    >>edx; e000dd8c <_end+1fc519b4/38449c88>
    >>esi; e000dd8c <_end+1fc519b4/38449c88>
    >>edi; c0393efe <log_buf+557e/8000>
    >>ebp; e000dd7c <_end+1fc519a4/38449c88>
    >>esp; e000dd6c <_end+1fc51994/38449c88>

    Trace; c010966a <nmi+1e/30>
    Trace; c022eb3a <serial_in+1a/30>
    Trace; c02346e0 <serial_console_write+80/210>
    Trace; c011f0b0 <__call_console_drivers+60/70>
    Trace; c011f195 <call_console_drivers+65/120>
    Trace; c011f4e3 <release_console_sem+53/b0>
    Trace; c011f401 <printk+141/180>
    Trace; c011a487 <do_page_fault+257/564>
    Trace; c011a230 <do_page_fault+0/564>
    Trace; c01095e0 <error_code+34/3c>
    Trace; c011a285 <do_page_fault+55/564>

    Code; c010a170 <do_nmi+20/b0>
    00000000 <_EIP>:
    Code; c010a170 <do_nmi+20/b0> <=====
       0: ff 80 d4 69 39 c0 incl 0xc03969d4(%eax) <=====
    Code; c010a176 <do_nmi+26/b0>
       6: f6 c3 c0 test $0xc0,%bl
    Code; c010a179 <do_nmi+29/b0>
       9: 75 35 jne 40 <_EIP+0x40>
    Code; c010a17b <do_nmi+2b/b0>
       b: 8b 0d 44 4d 32 c0 mov 0xc0324d44,%ecx
    Code; c010a181 <do_nmi+31/b0>
      11: 85 c9 test %ecx,%ecx
    Code; c010a183 <do_nmi+33/b0>
      13: 74 00 je 15 <_EIP+0x15>

    NMI Watchdog detected LOCKUP on CPU1, eip c011e4e2, registers:
    CPU: 1
    EIP: 0010:[<c011e4e2>] Tainted: PF
    EFLAGS: 00200086
    eax: f3962930 ebx: c590d098 ecx: f09bb5f4 edx: c590d09c
    esi: c590d008 edi: 00200296 ebp: d224dec4 esp: d224deb8
    ds: 0018 es: 0018 ss: 0018
    Process X (pid: 11167, stackpage=d224d000)
    Stack: c590d098 c590d008 c590d000 d224ded8 c015572e 00000000 00000000 00000013
           d224df18 c0155a2b d224df04 00000000 d224c000 00000304 00040000 d224c000
           00003535 00000001 00000000 00000000 c590d000 00000000 f69c6560 f69c6580
    Call Trace: [<c015572e>] [<c0155a2b>] [<c0155eab>] [<c0110cd1>] [<c0108608>] [<c01094ef>]
    Code: f3 90 7e f9 e9 49 e9 ff ff 80 3d 40 52 32 c0 00 f3 90 7e f5

    >>EIP; c011e4e2 <.text.lock.fork+1b/129> <=====

    >>eax; f3962930 <_end+335a6558/38449c88>
    >>ebx; c590d098 <_end+5550cc0/38449c88>
    >>ecx; f09bb5f4 <_end+305ff21c/38449c88>
    >>edx; c590d09c <_end+5550cc4/38449c88>
    >>esi; c590d008 <_end+5550c30/38449c88>
    >>ebp; d224dec4 <_end+11e91aec/38449c88>
    >>esp; d224deb8 <_end+11e91ae0/38449c88>

    Trace; c015572e <poll_freewait+2e/50>
    Trace; c0155a2b <do_select+13b/230>
    Trace; c0155eab <sys_select+34b/4f0>
    Trace; c0110cd1 <restore_i387+91/d0>
    Trace; c0108608 <restore_sigcontext+128/140>
    Trace; c01094ef <system_call+33/38>

    Code; c011e4e2 <.text.lock.fork+1b/129>
    00000000 <_EIP>:
    Code; c011e4e2 <.text.lock.fork+1b/129> <=====
       0: f3 90 repz nop <=====
    Code; c011e4e4 <.text.lock.fork+1d/129>
       2: 7e f9 jle fffffffd <_EIP+0xfffffffd>
    Code; c011e4e6 <.text.lock.fork+1f/129>
       4: e9 49 e9 ff ff jmp ffffe952 <_EIP+0xffffe952>
    Code; c011e4eb <.text.lock.fork+24/129>
       9: 80 3d 40 52 32 c0 00 cmpb $0x0,0xc0325240
    Code; c011e4f2 <.text.lock.fork+2b/129>
      10: f3 90 repz nop
    Code; c011e4f4 <.text.lock.fork+2d/129>
      12: 7e f5 jle 9 <_EIP+0x9>

    1 warning issued. Results may not be reliable.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Marcelo Tosatti: "Re: oops, 2.4.26 and jfs"

    Relevant Pages

    • Re: What does it mean if sysrq doesnt work?
      ... > see a backtrace generated using the magic sysrq key. ... if a lockup is so bad that sysrq doesn't work, ... you could see the kernel panic. ... If that doesn't work it's possible to redirect the kernel output to a serial ...
      (uk.comp.os.linux)
    • Re: next-20081117: kernel freezes with netconsole enabled
      ... SysRq does not work at that moment. ... kernel is very much tainted by POHMELFS code though. ... Call Trace: ... Oops: ...
      (Linux-Kernel)
    • Re: Testers needed: Joes MFC of USB code
      ... Seldom there will be no lockup, system will be alive and xscanimage ... we obtain good old kernel panic. ... pci0: <PCI bus> on pcib0 ... # The `bpf' pseudo-device enables the Berkeley Packet Filter. ...
      (freebsd-hackers)
    • Re: Testers needed: Joes MFC of USB code
      ... Seldom there will be no lockup, system will be alive and xscanimage ... we obtain good old kernel panic. ... pci0: <PCI bus> on pcib0 ... # The `bpf' pseudo-device enables the Berkeley Packet Filter. ...
      (freebsd-stable)
    • Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
      ... do you have the NMI watchdog enabled? ... CONFIG_SERIAL_CORE_CONSOLE, recompile & reinstall the kernel, add ... and that should show up in the minicom session on the other box. ... while the lockup happens. ...
      (Linux-Kernel)