Re: Serial related oops



On 2/19/07, Russell King <rmk+lkml@xxxxxxxxxxxxxxxx> wrote:
On Mon, Feb 19, 2007 at 12:37:00PM -0800, Michael K. Edwards wrote:
> What we've seen on our embedded ARM is that enabling an interrupt that
> is shared between multiple UARTs, at a stage when you have not set up
> all the data structures touched by the ISR and softirq, can have
> horrible consequences, including soft lockups and fandangos on core.

Incorrect. We have:

1. registered an interrupt handler at this point.
2. disabled interrupts (we're under the spin lock)

setup_irq() is where things go wrong, at least for us, at least on
2.6.16.x. Interrupts are not disabled at the point in request_irq()
when the interrupt controller is poked to enable the IRQ source. If
you're lucky, and you're on an architecture where the UART interrupt
is properly level-triggered, and the worst thing that happens when you
attempt to service an interrupt that isn't yours is that it stays on,
then you get a soft lockup with two or three recursive __irq_svc hits
in the backtrace. If you're not lucky you do a fandango on core.

So, no interrupt will be seen by the CPU since the interrupt is masked.

The interrupt would need to be masked for the entire duration of the
outer loop that calls serial8250_init() or the equivalent for all
platform devices that share the IRQ.

The test is intentionally designed to be safe from the interrupt
generation point of view.

But its context is not. Shared IRQ lines are a _problem_. You cannot
safely enable an IRQ until all devices that share it have had their
ISRs installed, unless you can absolutely guarantee at a hardware
level that the unitialized ones cannot assert the IRQ line. That does
not apply to any device that might have been touched by the bootloader
or the early init code, especially a UART.

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: DoModal isnt reentrant but failure mode could be improved
    ... The parent receives a Windows Message that causes it to inform the user that a serial overrun occured. ... The reason such a Windows Message reaches the parent even while the first modal dialog is still on the screen is that it's not a keyboard or mouse input message, and CDialog's message loop retrieves it and dispatches it. ... Temporarily I seemed to understand the idea of trying to make the UART send an additional interrupt when the buffer's 8th byte gets filled but we still really want the UART to continue buffering while it's still necessary. ...
    (microsoft.public.vc.mfc)
  • Re: [PATCH] serial driver PMC MSP71xx, kernel linux-mips.git mast er
    ... serial driver PMC MSP71xx, ... interrupt, ... methods and not UART types. ... THRI interrupt -- it signifies that the TX shift register is empty, ...
    (Linux-Kernel)
  • [PATCH] 8250 UART backup timer
    ... The patch below works around a minor bug found in the UART of the ... The problem is that the UART does not reassert the THRE ... RX interrupt kicks it into working again (ie. an unattended reboot could ... static void serial8250_timeout ...
    (Linux-Kernel)
  • RE: [PATCH] serial driver PMC MSP71xx, kernel linux-mips.git mast er
    ... serial driver PMC MSP71xx, ... +#ifdef CONFIG_PMC_MSP ... write will cause an interrupt, ... of UART registers, it's not specific to the DesignWare UART. ...
    (Linux-Kernel)
  • Re: [PATCH] 8250 UART backup timer
    ... The problem is that the UART does not reassert the THRE ... RX interrupt kicks it into working again (ie. an unattended reboot could ... and removes races in the bug detection code. ...
    (Linux-Kernel)