Re: Catching NForce2 lockup with NMI watchdog

From: Bill Davidsen (davidsen_at_tmr.com)
Date: 12/13/03

  • Next message: E. Taylor: "frequent oopses on 2.4.20-24.9"
    Date:	Sat, 13 Dec 2003 00:16:20 -0500 (EST)
    To: "Maciej W. Rozycki" <macro@ds2.pg.gda.pl>
    
    

    On Fri, 12 Dec 2003, Maciej W. Rozycki wrote:

    > On Fri, 12 Dec 2003, bill davidsen wrote:
    >
    > > | In practice, assuming the MP IRQ routing information provided the BIOS has
    > > | been correct (which is not always the case), prerequisites #1 and #2 have
    > > | been met so far, but #3 has proved to be occasionally problematic.
    > >
    > > In practice many system seem to take a good bit of guessing and testing.
    > > I have an old P-II which only works with acpi=force and nmi_watchdog=2,
    > > for instance.
    >
    > Well, the NMI watchdog is a side-effect feature that works by chance
    > rather than by design. So you can't really complain it doesn't work
    > somewhere, although I wouldn't mind if new hardware was designed such that
    > it works. You shouldn't have to use "acpi=force" for the watchdog to work
    > though and for a PII system if "nmi_watchdog=1" doesn't work, then I
    > suspect a BIOS bug (set APIC_DEBUG to 1 in asm-i386/apic.h and send me the
    > bootstrap log and a dump from `mptable' for a diagnosis, if interested).

    Has the check to see if the BIOS is old than very recent been removed? I
    used to get a message that the BIOS was too old, I believe that's what
    prompted the acpi to enable the local apic. Sorrt, I've been running that
    feature since 2.5.3x or so and I just carried it forward.

    >
    > > It would be nice if there were a program which could poke at the
    > > hardware and suggest options which might work, as in eliminating the
    > > ones which can be determined not to work. Absent that trial and error
    > > rule, unfortunately.
    >
    > Linux has all appropriate bits to set up hardware reasonably as long as
    > BIOS provides accurate information. The only case our code fails is when
    > BIOS tells us lies and the there's little we can do about it. Actually we
    > are doing hardware manufacturers a favor we try to handle some cases at
    > all -- it's the BIOS that should be fixed instead and it is software and
    > it is stored in Flash memories these days, so there's no excuse. So if
    > there's a problem with running Linux because of BIOS bugs, then please
    > bugger the manufacturer in the first place (and avoid the company in the
    > future if they don't support Linux).
    >
    > Sometimes the NMI watchdog works in principle, but its activation leads
    > to system instability -- almost always this is a symptom of buggy SMM code
    > executed by the BIOS behind our back (NMIs are disabled by default in the
    > SMM, but careless code may enable them by accident).

    Works fine for me, system stays up for 30-40 days when I let it... I also
    run softdog to catch hangs in user mode but not in the kernel. That also
    works.

    -- 
    bill davidsen <davidsen@tmr.com>
      CTO, TMR Associates, Inc
    Doing interesting things with little computers since 1979.
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: E. Taylor: "frequent oopses on 2.4.20-24.9"