crash on x86_64 - mm related?

From: Ryan Richter (ryan_at_tau.solarneutrino.net)
Date: 11/29/05

  • Next message: Fernando Luis Vazquez Cao: "Re: [PATCH & RFC] kdump and stack overflows"
    Date:	Tue, 29 Nov 2005 10:44:09 -0500
    To: linux-kernel@vger.kernel.org
    
    

    Hi, I booted 2.6.14.2 with the MPT fusion performance fix patch about a
    week ago on my file server. The machine crashed lat night while it was
    doing backups. You can see the voluminous kernel output below.

    Someone else recently had seemingly the same thing happen, but didn't
    think it was a kernel problem. You can read about it here:
    http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=338335

    I will reply later today with the kernel .config, right now I have to
    wait for someone to reboot the machine first.

    Any help would be appreciated,
    -ryan

     Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8)
    flags:0x010000000000000c mapping:ffff8100355f1dd8 mapcount:2 count:0
    Backtrace:

    Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a965>{free_hot_cold_page+101}
           <ffffffff80162007>{__page_cache_release+151} <ffffffff802b8fe8>{sgl_unmap_user_pages+120}
           <ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
           <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
           <ffffffff8010db7a>{system_call+126}
    Trying to fix it up, but a reboot is needed
    Bad page state at free_hot_cold_page (in process 'taper', page ffff81000260b6f8)
    flags:0x010000000000081c mapping:ffff81005c0fc310 mapcount:0 count:0
    Backtrace:

    Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a965>{free_hot_cold_page+101}
           <ffffffff80162007>{__page_cache_release+151} <ffffffff802b8fe8>{sgl_unmap
    _user_pages+120}
           <ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
           <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
           <ffffffff8010db7a>{system_call+126}
    Trying to fix it up, but a reboot is needed
    ----------- [cut here ] --------- [please bite here ] ---------
    Kernel BUG at include/linux/mm.h:341
    invalid operand: 0000 [1] SMP
    CPU 1
    Modules linked in: bonding
    Pid: 2418, comm: taper Tainted: G B 2.6.14.2 #1
    RIP: 0010:[<ffffffff802b8fcd>] <ffffffff802b8fcd>{sgl_unmap_user_pages+93}
    RSP: 0018:ffff810035725e18 EFLAGS: 00010256
    RAX: 0000000000000000 RBX: 0000000000000007 RCX: 000000000000000f
    RDX: 00000000000000e0 RSI: 0000000000000001 RDI: ffff81000260b6f8
    RBP: ffff810004852068 R08: 00000000ffffffff R09: 0000000000000000
    R10: 0000000000008000 R11: 0000000000000200 R12: 0000000000000008
    R13: 0000000000000000 R14: 0000000000008000 R15: ffff810004949d10
    FS: 00002aaaab53d880(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00002aaaaaac0000 CR3: 0000000035691000 CR4: 00000000000006e0
    Process taper (pid: 2418, threadinfo ffff810035724000, task ffff81017d680300)
    Stack: ffff8101423f3600 ffff810004852000 0000000000000040 0000000000008000
           ffff810004949c00 ffffffff802b48fb ffff810004852000 ffffffff802b4fb1
           ffff810000000000 ffffffff00000001
    Call Trace:<ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
           <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
           <ffffffff8010db7a>{system_call+126}

    Code: 0f 0b 68 ba 12 3a 80 c2 55 01 f0 83 47 08 ff 0f 98 c0 84 c0
    RIP <ffffffff802b8fcd>{sgl_unmap_user_pages+93} RSP <ffff810035725e18>
     ----------- [cut here ] --------- [please bite here ] ---------
    Kernel BUG at mm/rmap.c:487
    invalid operand: 0000 [2] SMP
    CPU 1
    Modules linked in: bonding
    Pid: 2418, comm: taper Tainted: G B 2.6.14.2 #1
    RIP: 0010:[<ffffffff8016f3f7>] <ffffffff8016f3f7>{page_remove_rmap+39}
    RSP: 0018:ffff810035725ab0 EFLAGS: 00010286
    RAX: 00000000ffffffff RBX: ffff8100356976f8 RCX: ffff81000000f000
    RDX: 0000000000000000 RSI: 8000000064c69067 RDI: ffff81000260b6f8
    RBP: 00002aaaaaadf000 R08: 0000000000000000 R09: ffff81000260b688
    R10: 00000000fffffffa R11: 0000000000000000 R12: ffff810101c22380
    R13: 8000000064c69067 R14: ffff81000260b6f8 R15: 0000000000000000
    FS: 00002aaaab53d880(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00002aaaaaac0000 CR3: 0000000035691000 CR4: 00000000000006e0
    Process taper (pid: 2418, threadinfo ffff810035724000, task ffff81017d680300)
    Stack: ffffffff80166ecd 00002aaaaab62000 ffff810035696aa8 00002aaaaab62000
           00002aaaaab62000 00002aaaaab61fff ffff810035695550 00002aaaaab62000
           ffffffff80167180 ffff810035725d68
    Call Trace:<ffffffff80166ecd>{zap_pte_range+477} <ffffffff80167180>{unmap_page_range+496}
           <ffffffff801672e5>{unmap_vmas+293} <ffffffff8016cfa2>{exit_mmap+162}
           <ffffffff80131ce1>{mmput+49} <ffffffff801371c6>{do_exit+438}
           <ffffffff8010f6f1>{die+81} <ffffffff8010f9df>{do_invalid_op+159}
           <ffffffff802b8fcd>{sgl_unmap_user_pages+93} <ffffffff80381f76>{thread_return+86}
           <ffffffff802a8662>{sym_setup_data_and_start+402} <ffffffff8010e84d>{error_exit+0}
           <ffffffff802b8fcd>{sgl_unmap_user_pages+93} <ffffffff802b8fe8>{sgl_unmap_user_pages+120}
           <ffffffff802b48fb>{release_buffering+27} <ffffffff802b4fb1>{st_write+1697}
           <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
           <ffffffff8010db7a>{system_call+126}

    Code: 0f 0b 68 9b 35 3a 80 c2 e7 01 48 c7 c6 ff ff ff ff bf 20 00
    RIP <ffffffff8016f3f7>{page_remove_rmap+39} RSP <ffff810035725ab0>
     <1>Fixing recursive fault but reboot is needed!
    Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
    <ffffffff801b9c7b>{ext3_prepare_write+27}
    PGD 355bc067 PUD 355c9067 PMD 0
    Oops: 0000 [3] SMP
    CPU 0
    Modules linked in: bonding
    Pid: 2416, comm: driver Tainted: G B 2.6.14.2 #1
    RIP: 0010:[<ffffffff801b9c7b>] <ffffffff801b9c7b>{ext3_prepare_write+27}
    RSP: 0018:ffff8100355e7b48 EFLAGS: 00010296
    RAX: 0000000000000000 RBX: ffffffff8040f660 RCX: 000000000000017d
    RDX: 0000000000000094 RSI: ffff81000260b6f8 RDI: ffff810035b09cc0
    RBP: 000000000000000e R08: 00000000fffffffa R09: 00000000000000e9
    R10: ffff81001190c818 R11: 0000000000000000 R12: ffff81000260b6f8
    R13: ffff81000260b6f8 R14: 000000000000017d R15: 0000000000000094
    FS: 00002aaaab53d8e0(0000) GS:ffffffff804db800(0000) knlGS:00000000555bc920
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 0000000000000000 CR3: 0000000035555000 CR4: 00000000000006e0
    Process driver (pid: 2416, threadinfo ffff8100355e6000, task ffff8100f43e8a80)
    Stack: ffff81014e643310 ffffffff8040f660 000000000000000e ffff81000260b6f8
           ffff81005c0fc310 0000000000000094 00000000000000e9 ffffffff80158247
           0000000000000292 00002aaaaaac0000
    Call Trace:<ffffffff80158247>{generic_file_buffered_write+551}
           <ffffffff801c06bd>{__ext3_journal_stop+45} <ffffffff8019ec74>{__mark_inode_dirty+52}
           <ffffffff801963ec>{inode_update_time+188} <ffffffff80158a08>{__generic_file_aio_write_nolock+936}
           <ffffffff80381f76>{thread_return+86} <ffffffff8013d349>{lock_timer_base+41}
           <ffffffff80158cfe>{generic_file_aio_write+110} <ffffffff801b7783>{ext3_file_write+35}
           <ffffffff8017ae43>{do_sync_write+211} <ffffffff8018ecc0>{__pollwait+0}
           <ffffffff8014a2b0>{autoremove_wake_function+0} <ffffffff8018f681>{sys_select+1153}
           <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
           <ffffffff8010db7a>{system_call+126}

    Code: 48 8b 28 48 89 ef e8 aa 26 00 00 c7 44 24 04 00 00 00 00 89
    RIP <ffffffff801b9c7b>{ext3_prepare_write+27} RSP <ffff8100355e7b48>
    CR2: 0000000000000000
     <0>Bad page state at prep_new_page (in process 'dumper', page ffff81000260b6f8)
    flags:0x010000000000001d mapping:0000000000000000 mapcount:-1 count:1
    Backtrace:

    Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a371>{prep_new_page+65}
           <ffffffff8015ab2e>{buffered_rmqueue+302} <ffffffff8015ad85>{__alloc_pages+261}
           <ffffffff801581bd>{generic_file_buffered_write+413}
           <ffffffff80139509>{current_fs_time+105} <ffffffff8019636e>{inode_update_time+62}
           <ffffffff80158a08>{__generic_file_aio_write_nolock+936}
           <ffffffff8031f4a4>{sock_common_recvmsg+52} <ffffffff8031bb30>{sock_aio_read+272}
           <ffffffff80158cfe>{generic_file_aio_write+110} <ffffffff801b7783>{ext3_file_write+35}
           <ffffffff8017ae43>{do_sync_write+211} <ffffffff8018ecc0>{__pollwait+0}
           <ffffffff8014a2b0>{autoremove_wake_function+0} <ffffffff8018f681>{sys_select+1153}
           <ffffffff8017af46>{vfs_write+198} <ffffffff8017b0a3>{sys_write+83}
           <ffffffff8010db7a>{system_call+126}
    Trying to fix it up, but a reboot is needed
    Bad page state at prep_new_page (in process 'find', page ffff81000260b6f8)
    flags:0x0100000000000064 mapping:ffff8100f3be9be9 mapcount:1 count:1
    Backtrace:

    Call Trace:<ffffffff80159f93>{bad_page+99} <ffffffff8015a371>{prep_new_page+65}
           <ffffffff8015ab2e>{buffered_rmqueue+302} <ffffffff8015ad85>{__alloc_pages+261}
           <ffffffff8015e7a3>{kmem_getpages+99} <ffffffff8015fbb0>{cache_grow+192}
           <ffffffff8015fe3b>{cache_alloc_refill+459} <ffffffff80160226>{kmem_cache_alloc+54}
           <ffffffff80193831>{d_alloc+33} <ffffffff80188fe9>{real_lookup+105}
           <ffffffff801893c0>{do_lookup+112} <ffffffff80189e07>{__link_path_walk+2551}
           <ffffffff8018a382>{link_path_walk+178} <ffffffff8018a8ce>{path_lookup+446}
           <ffffffff8018aa9e>{__user_walk+62} <ffffffff801849b6>{vfs_lstat+38}
           <ffffffff80184dff>{sys_newlstat+31} <ffffffff8010db7a>{system_call+126}
           
    Trying to fix it up, but a reboot is needed
    Unable to handle kernel paging request at 00002aaaab9c5b61 RIP:
    <ffffffff8015fdba>{cache_alloc_refill+330}
    PGD c2512067 PUD c2513067 PMD 0
    Oops: 0002 [4] SMP
    CPU 0
    Modules linked in: bonding
    Pid: 3011, comm: find Tainted: G B 2.6.14.2 #1
    RIP: 0010:[<ffffffff8015fdba>] <ffffffff8015fdba>{cache_alloc_refill+330}
    RSP: 0018:ffff810112f05c28 EFLAGS: 00010082
    RAX: 00002aaaab9c5b59 RBX: 0000000000000010 RCX: 0000000000029ba6
    RDX: 00002aaaab9c5bb3 RSI: ffff810064c69040 RDI: ffff81000c01a288
    RBP: ffff8100f6fc4800 R08: ffff81000c01a250 R09: ffff81000c01a260
    R10: 0000000000000000 R11: 0000000000000000 R12: ffff81000c01a240
    R13: ffff8100f6fc3640 R14: ffff81000c01a288 R15: 00000000000000d0
    FS: 00002aaaaae00640(0000) GS:ffffffff804db800(0000) knlGS:00000000555bc920
    CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    CR2: 00002aaaab9c5b61 CR3: 00000000c2bb3000 CR4: 00000000000006e0
    Process find (pid: 3011, threadinfo ffff810112f04000, task ffff810102b46040)
    Stack: ffff810112f05e68 ffff810179923cb8 fffffffffffffff4 ffff810112f05d28
           ffff810179923cb8 ffff810112f05d28 ffff810112f05e68 ffffffff80160226
           0000000000000292 ffffffff80193831
    Call Trace:<ffffffff80160226>{kmem_cache_alloc+54} <ffffffff80193831>{d_alloc+33}
           <ffffffff80188fe9>{real_lookup+105} <ffffffff801893c0>{do_lookup+112}
           <ffffffff80189e07>{__link_path_walk+2551} <ffffffff8018a382>{link_path_walk+178}
           <ffffffff8018a8ce>{path_lookup+446} <ffffffff8018aa9e>{__user_walk+62}
           <ffffffff801849b6>{vfs_lstat+38} <ffffffff80184dff>{sys_newlstat+31}
           <ffffffff8010db7a>{system_call+126}

    Code: 48 89 50 08 48 89 02 48 c7 46 08 00 02 20 00 83 7e 24 ff 48
    RIP <ffffffff8015fdba>{cache_alloc_refill+330} RSP <ffff810112f05c28>
    CR2: 00002aaaab9c5b61
     NMI Watchdog detected LOCKUP on CPU 1
    CPU 1
    Modules linked in: bonding
    Pid: 7, comm: events/1 Tainted: G B 2.6.14.2 #1
    RIP: 0010:[<ffffffff803837dd>] <ffffffff803837dd>{.text.lock.spinlock+118}
    RSP: 0018:ffff810004869dd0 EFLAGS: 00000086
    RAX: ffff81000c01a240 RBX: ffff81000c01a288 RCX: ffff8100f6fc3640
    RDX: 0000000000000003 RSI: 0000000000000003 RDI: ffff81000c01a288
    RBP: ffff810100009dc0 R08: 0000000000000000 R09: 0000000000000000
    R10: 00000000ffffffff R11: 0000000000000066 R12: 0000000000000000
    R13: ffff810100009dd0 R14: 0000000000000292 R15: ffff810100009e40
    FS: 00002aaaaae00640(0000) GS:ffffffff804db880(0000) knlGS:00000000556b6920
    CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00002aaaaaf1df40 CR3: 000000017f448000 CR4: 00000000000006e0
    Process events/1 (pid: 7, threadinfo ffff810004868000, task ffff8100f6fb6080)
    Stack: ffffffff8015e35b ffff8100f6fc3640 ffff810100009f60 0000000000000001
           ffff810100009e40 ffff8100f6fc3640 ffff8100f6fc38e0 ffff810100009f88
           ffffffff80161414 ffff810004869e58
    Call Trace:<ffffffff8015e35b>{drain_alien_cache+123} <ffffffff80161414>{cache_reap+164}
           <ffffffff80161370>{cache_reap+0} <ffffffff8014553c>{worker_thread+476}
           <ffffffff8012ed70>{default_wake_function+0} <ffffffff8012ed70>{default_wake_function+0}
           <ffffffff80145360>{worker_thread+0} <ffffffff80149c82>{kthread+146}
           <ffffffff8010ea02>{child_rip+8} <ffffffff80145360>{worker_thread+0}
           <ffffffff80149bf0>{kthread+0} <ffffffff8010e9fa>{child_rip+0}
           

    Code: 80 3f 00 7e f9 e9 59 fe ff ff e8 58 41 e9 ff e9 6f fe ff ff
    console shuts up ...
     <0>Kernel panic - not syncing: Aiee, killing interrupt handler!
     

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Fernando Luis Vazquez Cao: "Re: [PATCH & RFC] kdump and stack overflows"

    Relevant Pages

    • Re: computer becomes slow when compiling something
      ... Try the same thing again without INVARIANTS and WITNESS, both of which can consume a lot of CPU in kernel on a very active system, especially if lots of vnodes are being allocated and freed. ... If it's broke, don't fix it. ...
      (freebsd-current)
    • Re: more lockup laptop problems with dapper
      ... then reboot (I say reboot because I don't know if you are running gdm, ... If the above doesn't 'fix' it for you, ... Then reboot and at the boot prompt, choose the 'Ubuntu, kernel ...
      (Ubuntu)
    • Re: FreeBSD 5.1-R kernel panic
      ... Unfortunately, I ran into a problem on the reboot, the SMP kernel would fail ... Preloaded elf kernel "/boot/kernel/kernel" at 0xc0702000. ... CPU: IntelXeonCPU 2.40GHz ... >> As for using the option DDB in my kernel, ...
      (freebsd-current)
    • Re: scsi tape device support
      ... > Kernel does not recognize the device correctly, thinks it is DLT. ... > a) reboot the server, see what the kernel says about the device. ... about what will fix an unknown problem is a bit tough. ...
      (RedHat)
    • Re: [patch 2/2] x86 amd fix cmpxchg read acquire barrier
      ... and a fix is present in Solaris code base. ... CPU unless they don't share any memory mapping when the kernel finds it ...
      (Linux-Kernel)