Re: Catching NForce2 lockup with NMI watchdog

From: Maciej W. Rozycki (macro_at_ds2.pg.gda.pl)
Date: 12/12/03

  • Next message: Bartlomiej Zolnierkiewicz: "Re: [patch] ide.c as a module"
    Date:	Fri, 12 Dec 2003 18:21:16 +0100 (CET)
    To: "Richard B. Johnson" <root@chaos.analogic.com>
    
    

    On Fri, 12 Dec 2003, Richard B. Johnson wrote:

    > > Sometimes the NMI watchdog works in principle, but its activation leads
    > > to system instability -- almost always this is a symptom of buggy SMM code
    > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    > > executed by the BIOS behind our back (NMIs are disabled by default in the
    > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    > > SMM, but careless code may enable them by accident).
    >
    > The NMI vector goes to Linux code. In fact all interrupt vectors
    > go to Linux code. There is no way that some BIOS code could possibly
    > be accidentally executed here. Some Linux code would have to
    > call some 16-bit BIOS code somewhere, and it doesn't even know
    > where..........

     The problem happens when the SMM is active (i.e. the BIOS code is being
    executed) after an SMI has been received during Linux operation (SMIs may
    get triggered due to various reasons -- a parity/ECC error caught by the
    chipset, an access to an emulated 8042 controller, a power failure in a
    notebook, etc.) and an NMI arrives. When in the SMM, no interrupt
    (including the NMI) causes a switch back into the protected mode (and the
    processor expects real-mode style interrupt vectors), so the Linux's NMI
    handler is never reached and the SMM's NMI handler (if at all initialized)
    isn't appropriate for handling the NMI watchdog. Since the SMM cannot
    know what NMIs are used for in a particular OS, the code should best keep
    NMIs disabled -- then an arriving NMI event is latched and postponed until
    after the RSM instruction is executed.

     The SMM was invented to be transparent to a running OS, but care has to
    be taken for this to be true and firmware bugs sometimes make the SMM
    activity visible.

    -- 
    +  Maciej W. Rozycki, Technical University of Gdansk, Poland   +
    +--------------------------------------------------------------+
    +        e-mail: macro@ds2.pg.gda.pl, PGP key available        +
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Bartlomiej Zolnierkiewicz: "Re: [patch] ide.c as a module"

    Relevant Pages

    • Re: [PATCH] NMI watchdog config option (was: Re: [PATCH] NMI lockup and AltSysRq-P dumping calltrace
      ... NMI makes all the fast system call etc stuff much more ... this is actually not a problem with the watchdog itself. ... bits of code that keep interrupts off CPUs. ... The problem with the SMM as currently used by BIOSes is unfortunately the ...
      (Linux-Kernel)
    • RE: User display in task manager
      ... But the problem is the implementation of the NMI handling with a TaskGate. ... Chapter 11.7 NMI HANDLING WHILE IN SMM ... During NMI interrupt handling, NMI interrupts are disabled, so ... enters SMM while executing an NMI handler, ...
      (microsoft.public.windowsxp.embedded)
    • RE: Intel i8xx watchdog driver
      ... > | that, when called due to watchdog, issued an NMI and did ... > | the system reset, and usually you got at least the ... The ICH seemed to be only one shot so ... The SMM, for those who don't know it, is a virtual ...
      (freebsd-hackers)
    • Re: NMI isr
      ... If you can tell us why you're trying to write an *interrupt service routine* ... for a *non-maskable interrupt* (NMI) on Linux then maybe we can help. ... > Can someone show me how to setup NMI ISR under Linux for x86 arch? ...
      (alt.os.linux)