Re: NFS oops on 2.6.14.2

From: Trond Myklebust (trond.myklebust_at_fys.uio.no)
Date: 11/29/05

  • Next message: Michael Krufky: "Re: Gene's pcHDTV 3000 analog problem"
    To: Ryan Richter <ryan@tau.solarneutrino.net>
    Date:	Tue, 29 Nov 2005 16:26:37 -0500
    
    

    On Tue, 2005-11-29 at 15:00 -0500, Ryan Richter wrote:
    > I got an oops on two NFS clients after upgrading to 2.6.14.2.
    >
    > Here's one:
    >
    > Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
    > <ffffffff801dbd9e>{nlmclnt_mark_reclaim+62}
    > PGD 7bdd4067 PUD 7bdd5067 PMD 0
    > Oops: 0000 [1]
    > CPU 0
    > Modules linked in:
    > Pid: 1317, comm: lockd Not tainted 2.6.14.2 #2
    > RIP: 0010:[<ffffffff801dbd9e>] <ffffffff801dbd9e>{nlmclnt_mark_reclaim+62}
    > RSP: 0018:ffff81007dfade70 EFLAGS: 00010246
    > RAX: 0000000000000000 RBX: ffff81007ad80b00 RCX: ffff81007e22d858
    > RDX: ffff81007e22d8f0 RSI: ffff81007e22d8e8 RDI: ffff81007ad80b00
    > RBP: ffff81007ec18800 R08: 00000000fffffffa R09: 0000000000000001
    > R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
    > R13: 0000000000000000 R14: ffffffff803ec420 R15: ffff81007df61014
    > FS: 00002aaaab00c4a0(0000) GS:ffffffff804b6800(0000) knlGS:00000000555e68a0
    > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > CR2: 0000000000000018 CR3: 000000007c8fc000 CR4: 00000000000006e0
    > Process lockd (pid: 1317, threadinfo ffff81007dfac000, task ffff81007eea61c0)
    > Stack: ffffffff801dbe6b ffff81007ad80b00 ffffffff801e3d8c 3256cc84d4030002
    > 0000000000000000 ffff81007df4ec68 ffff81007df4ec00 ffffffff803ed4a0
    > ffff81007df4eca0 ffff81007df4ec68
    > Call Trace:<ffffffff801dbe6b>{nlmclnt_recovery+139} <ffffffff801e3d8c>{nlm4svc_proc_sm_notify+188}
    > <ffffffff8034c5a4>{svc_process+884} <ffffffff8012ab40>{default_wake_function+0}
    > <ffffffff801dde00>{lockd+352} <ffffffff801ddca0>{lockd+0}
    > <ffffffff8010e352>{child_rip+8} <ffffffff801ddca0>{lockd+0}
    > <ffffffff801ddca0>{lockd+0} <ffffffff8010e34a>{child_rip+0}
    >
    >
    > Code: 48 39 78 18 75 1c 8b 86 8c 00 00 00 a8 01 74 12 83 c8 02 89
    > RIP <ffffffff801dbd9e>{nlmclnt_mark_reclaim+62} RSP <ffff81007dfade70>
    > CR2: 0000000000000018
    > <4>do_vfs_lock: VFS is out of sync with lock manager!
    > do_vfs_lock: VFS is out of sync with lock manager!
    >
    >
    > And another (different machine, but essentially identical to the one that
    > produced the previous):
    >
    > Unable to handle kernel NULL pointer dereference at 0000000000000018 RIP:
    > <ffffffff801dbd9e>{nlmclnt_mark_reclaim+62}
    > PGD 7bdd1067 PUD 7bdd2067 PMD 0
    > Oops: 0000 [1]
    > CPU 0
    > Modules linked in:
    > Pid: 1317, comm: lockd Not tainted 2.6.14.2 #2
    > RIP: 0010:[<ffffffff801dbd9e>] <ffffffff801dbd9e>{nlmclnt_mark_reclaim+62}
    > RSP: 0018:ffff81007dfade70 EFLAGS: 00010246
    > RAX: 0000000000000000 RBX: ffff810079254d40 RCX: ffff81007e227858
    > RDX: ffff81007e2278f0 RSI: ffff81007e2278e8 RDI: ffff810079254d40
    > RBP: ffff81007ec0de00 R08: 00000000fffffffa R09: 0000000000000001
    > R10: 00000000ffffffff R11: 0000000000000000 R12: 0000000000000000
    > R13: 0000000000000000 R14: ffffffff803ec420 R15: ffff81007df3d014
    > FS: 00002aaaab00c4a0(0000) GS:ffffffff804b6800(0000) knlGS:0000000055efbd20
    > CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
    > CR2: 0000000000000018 CR3: 000000007d30f000 CR4: 00000000000006e0
    > Process lockd (pid: 1317, threadinfo ffff81007dfac000, task ffff81007eea61c0)
    > Stack: ffffffff801dbe6b ffff810079254d40 ffffffff801e3d8c 3256cc84d4030002
    > 0000000000000000 ffff81007df39c68 ffff81007df39c00 ffffffff803ed4a0
    > ffff81007df39ca0 ffff81007df39c68
    > Call Trace:<ffffffff801dbe6b>{nlmclnt_recovery+139} <ffffffff801e3d8c>{nlm4svc_proc_sm_notify+188}
    > <ffffffff8034c5a4>{svc_process+884} <ffffffff8012ab40>{default_wake_function+0}
    > <ffffffff801dde00>{lockd+352} <ffffffff801ddca0>{lockd+0}
    > <ffffffff8010e352>{child_rip+8} <ffffffff801ddca0>{lockd+0}
    > <ffffffff801ddca0>{lockd+0} <ffffffff8010e34a>{child_rip+0}
    >
    >
    > Code: 48 39 78 18 75 1c 8b 86 8c 00 00 00 a8 01 74 12 83 c8 02 89
    > RIP <ffffffff801dbd9e>{nlmclnt_mark_reclaim+62} RSP <ffff81007dfade70>
    > CR2: 0000000000000018

    Both presumably following a server reboot?

    Do you have any sure-fire way to reproduce it?

    > These machines have an NFS-mounted root, but this is mounted nolock so I'm
    > assuming that's unrelated. The other NFS mounts have options like:
    >
    > rw,nosuid,nodev,v3,rsize=8192,wsize=8192,hard,intr,udp,lock
    >
    > I've also been seeing lots of the "do_vfs_lock: VFS is out of sync with lock
    > manager!", but that has been happening at least since 2.6.13.

    That is usually the result of doing kill -9/kill -TERM/kill -INT on a
    process that was in the act of grabbing a lock.

    Cheers,
      Trond

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Michael Krufky: "Re: Gene's pcHDTV 3000 analog problem"

    Relevant Pages