Re: 2.6.24-rc8 hangs at mfgpt-timer



On 18/01/08 00:39 +0100, Arnd Hannemann wrote:
Jordan Crouse schrieb:
On 17/01/08 23:52 +0100, Arnd Hannemann wrote:

<snip>

Hmmm - not sure whats happening here. I wonder if we're stuck in an
interrupt storm of some sort as soon as you register the interrupt handler.
But I would think that whatever was causing the interrupt storm would be
running well before we hit setup_irq(), and you would be recording "nobody
cared" interrupts left and right.
Interesting thing is that it hangs not in setup_irq() but later, right
after printing the newline of the printk.

THat makes me think interrupt storm even more.

The thing that scares me is that the TinyBIOS seems to know that we want
to use the MFGPT timers, and I wonder if they did anything behind the scenes
to "help us out" even though we didn't ask for it.

I don't know how easy it would be for you - but can you try reading
MSRs 0x51400020 - 0x51400023? If you need a command line app to do it,
you can use rdmsr from here:

http://wiki.laptop.org/go/Flashing_LinuxBIOS_on_A-Test_Boards
MSR register 0x51400020 => b7:ef:5f:f4:bf:d1:95:68
MSR register 0x51400021 => b7:fd:1f:f4:bf:cf:5a:d8
MSR register 0x51400022 => b7:f3:bf:f4:bf:f5:fb:a8
MSR register 0x51400023 => b7:fb:9f:f4:bf:fd:d9:f8

Hmmm - those look wrong. Is /dev/cpu/0/msr there? The applet on the
wiki has a bug that doesn't check for it.
I'm sorry, I should have checked: I didn't execute rdmsr as root.
The correct ones:

MSR register 0x51400020 => 00:00:00:00:00:00:0f:00
MSR register 0x51400021 => 00:00:00:00:04:00:00:00
MSR register 0x51400022 => 00:00:00:00:00:00:00:00
MSR register 0x51400023 => 00:00:00:00:00:0c:ba:90

Okay - those are sane. Those are the IRQ routing MSRs - each nibble is an
IRQ for something in the southbridge - since 7 doesn't appear, a rogue
interrupt isn't likely. Also, [0:31] in 0x51400022 are the MFGPT interrupts
specifically, and they are 0. The timer tick code would set the first nibble
to 7 (in a sane system).

So, everything _seems_ okay from the hardware side. The next step would be
to comment out the setup_irq() function and write a C app or something to
verify that all the MFGPT registers are set up like we expect them to be.
However, I think that maybe we can accomplish the same thing with the
forthcoming watchdog timer, since it will prove the MFGPT is behaving
well (though it doesn't use the interrupt wrapper, but one step at a time).

Jordan
--
Jordan Crouse
Systems Software Development Engineer
Advanced Micro Devices, Inc.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • pull request: wireless-2.6 2008-09-24
    ... One more for 2.6.27 -- a fix for a possible interrupt storm. ... even a link in the commit log to a youtube video of the storm in ...
    (Linux-Kernel)
  • Re: sio: lots of silo overflows on Asus K8V with Moxa Smartio C104H/PCI
    ... Keep trying to make it use a fast interrupt. ... Another reson to try using a fast interrupt is that interrupt storm ... > input character rate. ... RELENG_4 has the correct formatting "clk irq0". ...
    (freebsd-current)
  • Re: 6-STABLE locks solid - current ok, why?
    ... managment, supend and resume support. ... Most system hangs are a result of lost interrupts or an interrupt storm. ...
    (freebsd-stable)
  • Re: [FreeBSD] Kann nicht drucken.
    ... Jeder versuch im Web-Frontend von CUPS eine Testseite zu drucken wird ... Im Handbuch war ich von der Stiefmüttzerlichen behandlung von CUPS ... Und was ist so ein interrupt storm? ...
    (de.comp.os.unix.bsd)
  • [patch 2.6.27-rc8-git] add drivers/mfd/twl4030-core.c
    ... This patch adds the core of the TWL4030 driver, ... There are some known issues with this core code. ... * often at around 3 Mbit/sec, including for interrupt handling. ... and exports register access primitives. ...
    (Linux-Kernel)