Re: 2.6.9-rc2-mm4

From: Ingo Molnar (mingo_at_elte.hu)
Date: 09/28/04

  • Next message: Vojtech Pavlik: "Re: [BUG: 2.6.9-rc2-bk11] input completely dead in X"
    Date:	Tue, 28 Sep 2004 13:05:41 +0200
    To: Gene Heskett <gene.heskett@verizon.net>
    
    

    * Gene Heskett <gene.heskett@verizon.net> wrote:

    > That would I assume need a null modem cable, and what do I run on the
    > firewall? Minicom? Or is there something better that can just grab
    > and log without being interactive? Its a rh7.3 box with a 2.4.18 era
    > kernel. I'd update that, but its not broken. :)

    here's a mini-howto:

    to set up serial logging:
    -------------------------

    install a null modem cable to one of the serial ports of the server,
    connect the cable to another box, run a terminal program on that other
    box (e.g. "minicom -m" - do Alt-L to switch on logging after starting it
    up) and set up the server's kernel to do serial logging: enable
    CONFIG_SERIAL_8250_CONSOLE and CONFIG_SERIAL_CORE_CONSOLE, recompile &
    reinstall the kernel, add "console=ttyS0,38400 console=tty0" to your
    /etc/grub.conf or /etc/lilo.conf kernel boot line, reboot the server
    with the new kernel command line - and configure minicom to run with
    that speed (Alt-S).

    e.g. my /etc/grub.conf has:

    title test-2.6 (test-2.6)
            root (hd0,0)
            kernel /boot/bzImage root=/dev/sda1 console=ttyS0,38400 console=tty0 nmi_watchdog=1 kernel_preempt=1

    if everything is set up correctly then you should see kernel messages
    showing up in the minicom session when you boot up.

    When the messages do not show up then typical errors are mismatch
    between the serial port (or speed) and the device names used - if it's
    COM2 then use ttyS1, and dont forget to set up the serial speed option
    of minicom, etc. You can test the serial connection by doing:

            echo x > /dev/ttyS0

    and that should show up in the minicom session on the other box.

    to set up the NMI watchdog:
    ---------------------------

    add nmi_watchdog=1 to your boot parameters and reboot - that should be
    all to get it active. If all CPU's NMI count increases in
    /proc/interrupts then it's working fine. If the counts do not increase
    (or only one CPU increases it) then try nmi_watchdog=2 - this is another
    type of NMI that might work better. (Very rarely there are boxes that
    dont have reliable NMI counts with 1 and 2 either - but i dont think
    your box is one of those.)

    once the NMI watchdog is up and running it should catch all hard lockups
    and print backtraces to the serial console - even if you are within X
    while the lockup happens. You can test hard lockups by running the
    attached 'lockupcli' userspace code as root - it turns off interrupts
    and goes into an infinite loop => instant lockup. The NMI watchdog
    should notice this condition after a couple of seconds and should abort
    the task, printing a kernel trace as well. Your box should be back in
    working order after that point.

    now for the real lockup your box wont be 'fixed' by the NMI watchdog, it
    will likely stay locked up, but you should get messages on the serial
    console, giving us an idea where the kernel locked up and why. (Very
    rarely it happens that not even the NMI watchdog prints anything for a
    hard lockup - this is often the sign of hardware problems.)

            Ingo

    --- lockupcli.c

    main ()
    {
            iopl(3);
            for (;;) asm("cli");
    }

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Vojtech Pavlik: "Re: [BUG: 2.6.9-rc2-bk11] input completely dead in X"

    Relevant Pages

    • Re: [patch] Real-Time Preemption, -RT-2.6.12-rc6-V0.7.48-00
      ... do you have the NMI watchdog enabled? ... CONFIG_SERIAL_CORE_CONSOLE, recompile & reinstall the kernel, add ... and that should show up in the minicom session on the other box. ... while the lockup happens. ...
      (Linux-Kernel)
    • Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm3-V0.7.32-12
      ... debug lockups with the NMI watchdog. ... CONFIG_SERIAL_CORE_CONSOLE, recompile & reinstall the kernel, add ... /etc/lilo.conf kernel boot line, reboot the server with the new kernel ... while the lockup happens. ...
      (Linux-Kernel)
    • Re: HPET regression in 2.6.26 versus 2.6.25 -- NMI watchdog, interesting results
      ... when I got home from work I tried NMI watchdog. ... the nature of the hard lockup itself. ... I mostly have been only reading kernel documentation ... The commits by Yinghai which cause the lockups make changes involving ...
      (Linux-Kernel)
    • Re: 2.6.9-rc2-mm4
      ... > kernel command line - and configure minicom to run with that speed ... >to set up the NMI watchdog: ... >once the NMI watchdog is up and running it should catch all hard ... >now for the real lockup your box wont be 'fixed' by the NMI ...
      (Linux-Kernel)
    • Re: Testers needed: Joes MFC of USB code
      ... Seldom there will be no lockup, system will be alive and xscanimage ... we obtain good old kernel panic. ... pci0: <PCI bus> on pcib0 ... # The `bpf' pseudo-device enables the Berkeley Packet Filter. ...
      (freebsd-hackers)