Re: Serial related oops



On 2/19/07, Russell King <rmk+lkml@xxxxxxxxxxxxxxxx> wrote:
I think something else is going on here. I think you're getting
an interrupt for the UART, and another interrupt is also pending.

Correct. An interrupt for the other UART on the same IRQ.

When the UART interrupt is handled, it is masked at the interrupt
controller, and the CPU mask is dropped.

Correct.

The second interrupt comes in, and when you go to disable that
source, you inadvertently re-enable the UART interrupt, despite it
still being serviced.

Incorrect. An attempt has been made to service the interrupt using
the only ISR currently in the chain for that IRQ -- the ISR for the
first UART. That attempt was not successful, and when __do_irq
unmasks the interrupt source preparatory to exiting interrupt context,
__irq_svc is dispatched anew.

This leads to the UART interrupt again triggering an IRQ.

Right. The _second_ UART's interrupt. There's another problem with
these UARTs having to do with the implementor's inability to read and
follow a bog-standard twenty-year-old spec without asking software to
fix up corner cases, but that's another backtrace for another day.

Please show your interrupt controller (mask, unmask, mask_ack)
handling functions corresponding with the interrupt which your
UART is connected to.

Don't have 'em handy; I'll be happy to post them when I do, perhaps
later today. I would hope they're pretty generic, though; it's a
Feroceon core pretending to be an ARM926EJ-S, hooked to the usual
half-assed Marvell imitation of an ARM licensed functional block.
Trust me for the moment, it's the same IRQ line.

This shows that you don't actually have an understanding of the Linux
kernel boot, especially in respect of serial devices. At boot, devices
are detected and initialised to a safe state, where they will not
spuriously generate interrupts.

Sorry, 'taint so. Not unless the chip support droid has put the right
stuff in arch/arm/mach-foo. LKML is littered with the fall-out of the
decision to trust whoever jumped to main() to have left the hardware
in a sane state. If you don't enjoy this sort of forensics (which I
for one do not, especially not when there is a project deadline
looming and a Heisenbug starts firing 9 times out of 10), you might
consider systematically installing ISRs that know how to shut
everything up before turning on any interrupt sources at all.

As I said, this is not going to happen overnight, and is not even
particularly in the economic interest of people who get paid by the
hour to wear bringup wizard hats. That category currently includes
me, but I am intensely bored with this game and aspire to greater
things.

When a userspace program opens a serial port, which can only happen
once the kernel boot has completed (ergo, devices have been initialised
and placed in a safe state) the interrupts are claimed, and enabled
at the source.

As you can see from the console dump I posted (which begins with
"Freeing init memory: 92K" and ends with do_exit -> init -> sys_open,
which is obviously sys_open("/dev/console")), this happens long before
userspace comes into the picture. Our 8250.c has some nasty hacks in
it but otherwise this call chain is from a very nearly vanilla
2.6.16.recent.

We've already worked around this on our board, and the whole kit and
kaboodle will eventually be posted to linux-arm-kernel in tidy patches
when my client lets me spend billable hours on it (immediately after
the damn thing passes its first functional test, long before it
ships). I'm not asking for anyone's help except in the
let's-all-help-one-another spirit. I'm trying to help with root cause
analysis of Frederik's (Jose's?) fandango on core. If it's not
relevant, my apologies; and although it goes without saying, I salute
you for both the serial driver and the ARM port.

Now please take a second look at the backtrace before toasting me
lightly again. Mmm'kay? Oh, and by the way -- is there an Alt-SysRq
equivalent on an ARM serial console?

Cheers,
- Michael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: DoModal isnt reentrant but failure mode could be improved
    ... The parent receives a Windows Message that causes it to inform the user that a serial overrun occured. ... The reason such a Windows Message reaches the parent even while the first modal dialog is still on the screen is that it's not a keyboard or mouse input message, and CDialog's message loop retrieves it and dispatches it. ... Temporarily I seemed to understand the idea of trying to make the UART send an additional interrupt when the buffer's 8th byte gets filled but we still really want the UART to continue buffering while it's still necessary. ...
    (microsoft.public.vc.mfc)
  • Re: [PATCH] serial driver PMC MSP71xx, kernel linux-mips.git mast er
    ... serial driver PMC MSP71xx, ... interrupt, ... methods and not UART types. ... THRI interrupt -- it signifies that the TX shift register is empty, ...
    (Linux-Kernel)
  • [PATCH] 8250 UART backup timer
    ... The patch below works around a minor bug found in the UART of the ... The problem is that the UART does not reassert the THRE ... RX interrupt kicks it into working again (ie. an unattended reboot could ... static void serial8250_timeout ...
    (Linux-Kernel)
  • RE: [PATCH] serial driver PMC MSP71xx, kernel linux-mips.git mast er
    ... serial driver PMC MSP71xx, ... +#ifdef CONFIG_PMC_MSP ... write will cause an interrupt, ... of UART registers, it's not specific to the DesignWare UART. ...
    (Linux-Kernel)
  • Re: [PATCH] 8250 UART backup timer
    ... The problem is that the UART does not reassert the THRE ... RX interrupt kicks it into working again (ie. an unattended reboot could ... and removes races in the bug detection code. ...
    (Linux-Kernel)