Re: Suggestions with hard lockup on 4 systems, have oops report

From: Adam Kropelin (akropel1_at_rochester.rr.com)
Date: 07/16/04

  • Next message: Zwane Mwaikambo: "Re: Kernel oops while shutting down (2.6.8rc1)"
    Date:	Fri, 16 Jul 2004 14:08:12 -0400
    To: Brian McEntire <brianm@fsg1.nws.noaa.gov>
    
    

    On Fri, Jul 16, 2004 at 11:01:39AM -0400, Brian McEntire wrote:
    > Thank you for taking time from your busy days to read this. You all
    > (kernel maintainers) rock! :)
    >
    > I have four Linux hosts, with identical hardware and OSs, that exhibit a
    > very tough to troubleshoot hang/freeze. About once every two weeks (and

    <snip>

    > The OS specifics:
    > RH 7.2 with latest patches except running kernel 2.4.9-31enterprise for
    > CM reasons (at one point, I tried the latest available RH 7.2 kernel but
    > it did not improve stability so I went back.)
    > bcm5700-7.1.22-1
    > nvidia ?? (no RPM listed, didn't know where to find the version.)

    You've really got to eliminate the binary bcm5700 and nvidia modules in
    order to diagnose this. Based on the oops, bcm5700 looks suspect, but it
    could just be the unlucky guy whose memory was stepped on by nvidia or
    some other part of the kernel.

    Switch to an open NIC like e1000 temporarily (or better yet,
    permanently) and see if the lockup persists. Do the same with nvidia. If
    you can reproduce the problem without ever having loaded either module
    (unloading the module once it's loaded is not sufficient), post the new
    oops and you'll have a solid foundation for debugging.

    --Adam

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Zwane Mwaikambo: "Re: Kernel oops while shutting down (2.6.8rc1)"

    Relevant Pages

    • Re: [RFC] Strange code in cpu_idle()
      ... > I get this oops for weeks with several kernel versions now: ... It would be a lot easier to debug with a vanilla kernel and no nvidia ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: SMP HT + USB2.0 crash
      ... I tested it with kernel 2.6.15.7 with nvidia ... happens with kernel 2.6.16.19 without nvidia drivers ... it takes 1-2 days to crash. ... Usually it reports the similar oops like I supplied, ...
      (Linux-Kernel)
    • Re: [RFC] Small PCI core patch
      ... but good luck trying to convince ATI and/or nVidia ... ... support for it in linux or X.org except maybe with some future version ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: 2.6.3 Boot Failure on Nforce2 Board
      ... I use same Albatron KM18G Pro boards in imaging systems. ... Back in early December had same lockup problems. ... One day NVIDIA and AMD might divulge actual cause but I am still ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: x86_64: calibrate_delay_direct and apic id lift for BSP
      ... I wonder if 8111 only support 4 bit apicid, so it can not send irq to ... BSP at apic id 0x10.... ... > The timer is wired different on nvidia than on 8111. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)