2.6.1 IO lockup on SMP systems

From: Sergey S. Kostyliov (rathamahata_at_php4.ru)
Date: 01/31/04

  • Next message: Greg Norris: "Re: net-pf-10, 2.6.1"
    To: linux-kernel@vger.kernel.org
    Date:	Sat, 31 Jan 2004 19:40:27 +0300
    
    

    Hello all,

    I had experienced a lockups on three of my servers with 2.6.1. It doesn't
    look like a deadlock, the box is still pingable and all tcp ports which were
     in listen state before a lockup are remains in listen state, but I can't get
    any data from this ports. According to sar(1) systems had not been overloaded
    right before a lockup. And there is no log entries in all user services logs
    for almost 10 hours after lockup.

    So I think this is an IO lockup. On the other side it doesn't look like a bug
     in particular controller driver, because they are different for each box.
    And finally it doesn't look like a bug in particular io-scheduler because two
    of boxes were runed with "deadline" and one with "as". Of course all
    assumptions are valid only if all lockups I had seen have the same nature.

    All of three boxes are SMP. Unfortunately all are remote and aren't attached
    to a serial console yet (this is planed in next couple of weeks).

    1) ope
    01:02.1 RAID bus controller: Mylex Corporation: Unknown device 0050 (rev 02)
    elevator=deadline
    .config: http://sysadminday.org.ru/2.6.1-io_lockup/ope/.config
    lspci: http://sysadminday.org.ru/2.6.1-io_lockup/ope/lspci
    lspci -vvn: http://sysadminday.org.ru/2.6.1-io_lockup/ope/lspci_-vvn

    2) white
    02:04.0 RAID bus controller: American Megatrends Inc. MegaRAID (rev 02)
    elevator=deadline
    .config: http://sysadminday.org.ru/2.6.1-io_lockup/white/.config
    lspci: http://sysadminday.org.ru/2.6.1-io_lockup/white/lspci
    lspci -vvn: http://sysadminday.org.ru/2.6.1-io_lockup/white/lspci_-vvn

    3) tiny
    02:00.0 Unknown mass storage controller: Compaq Computer Corporation Smart-2/P RAID Controller (rev 03)
    03:00.0 Unknown mass storage controller: Compaq Computer Corporation Smart-2/P RAID Controller (rev 03)
    elevator=as
    .config: http://sysadminday.org.ru/2.6.1-io_lockup/tiny/.config
    lspci: http://sysadminday.org.ru/2.6.1-io_lockup/tiny/lspci
    lspci -vvn: http://sysadminday.org.ru/2.6.1-io_lockup/tiny/lspci_-vvn

    Any hints will be appreciated.

    -- 
                       Best regards,
                       Sergey S. Kostyliov <rathamahata@php4.ru>
                       Public PGP key: http://sysadminday.org.ru/rathamahata.asc
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Greg Norris: "Re: net-pf-10, 2.6.1"

    Relevant Pages

    • Re: 2.6.12-rc2-mm3
      ... Otherwise the kernels seem to work fine -- no lockup unless ... installer couldn't handle it at that time. ... Workload is normal, the lockups happen with just X and Azaereus. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: nforce2 lockups
      ... > Windows 2000 flawlessly, but lockup in a minute under Linux. ... > and disabled the onboard IDE and still had lockups. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [patch] Real-Time Preemption, -RT-2.6.10-rc2-mm1-V0.7.27-3
      ... > might have been coincidence. ... managed to reproduce the lockup on my testbox, using your .config, ... Will turn on the NMI watchdog now, hopefully this lockup will ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [patch 0/3] j_state_lock, j_list_lock, remove-bitlocks
      ... On Thu, 2005-03-17 at 11:23 -0500, Steven Rostedt wrote: ... Is it a performance regression, or a latency issue, or a ... I think you can get this lockup whether or not it ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: Fixes for nforce2 hard lockup, apic, io-apic, udma133 covered
      ... I just had a lockup running with preempt, ... >> Survived my greptest which no non patched kernel has ever done on this ... > To match dmesg output try ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)