Re: [PATCH 2.6.16] Shared interrupts sometimes lost



Hi!

This is my first real introduction to the IRQ handling code in Linux,
so please forgive any little errors. I'm fairly sure the big picture
is right, partly because the patch helps so much.

To explain what I think is happening, let me start with a very simple
case. A number of PCI devices (this one included) have a number of
events which can trigger an interrupt. The events which are current
are presented as bits in a register, and are ORed together (and
possibly masked by another register) to make the IRQ line.
When 1's are written to any bits in this register, it acknowledges
the event and clears the bit.
A typical code fragment is
events = read_register(INTERRUPTS);
write_register(INTERRUPTS, events);
... handle each 1 bits in events ....

This would normally clear all pending events and cause the interrupt
line to go low (or at least to not be asserted).

However there is room for a race here. If an event occurs between
the read and the write, then this will NOT de-assert the IRQ line.
It will remain asserted throughout.

Now if the IRQ is handled as an edge-triggered line (which I believe
they are in Linux), then losing this race will mean that we don't see
any more interrupts on this line.

I believe that

a) any shared interrupts should be level-triggered. It is not okay to
share edge-triggered interrupt

b) your patch does not fix that issue. It only makes race window
smaller.

if (!(action->flags & SA_INTERRUPT))
local_irq_enable();

do {
ret = action->handler(irq, action->dev_id, regs);
- if (ret == IRQ_HANDLED)
+ if (ret == IRQ_HANDLED) {
status |= action->flags;
+ repeat = 1;
+ }
retval |= ret;
action = action->next;
+ if (!action &&
+ repeat &&
+ safeirq &&
+ (actionlist->flags & SA_SHIRQ)) {
+ /* at least one handler on the list did something,
+ * and the interrupt is sharable, so give
+ * every handler another chance, incase a new event
+ * came in and is holding the irq line asserted.
+ */
+ action = actionlist;
+ repeat = 0;
+ }
} while (action);

I think it is still racy. What if another interrupt comes here?

if (status & SA_SAMPLE_RANDOM)

Pavel
--
Thanks for all the (sleeping) penguins.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: PATCH/RFC: [kdump] fix APIC shutdown sequence
    ... This patch fixes a problem that we have encountered ... the IO-APIC gets stuck if it sends an IRQ ... it will not accept the interrupt and IRR bit of IOAPIC ... EOI is sent. ...
    (Linux-Kernel)
  • Re: 2.6.12 PREEMPT_RT && PPC
    ... > work building and minimally booting for PPC. ... I've applied most of your patch and have released the -51-37 ... > expects to be called in order to terminate the interrupt. ... ->end, then the handling of the IRQ ...
    (Linux-Kernel)
  • Re: [patch] warn on release_region() from irq context
    ... > the rationale for this patch is that the lock validator (which now ... I did the patch below - but that's not enough, ... the timer irq - so floppy.c file needs a serious redesign to fix these ... + * Interrupt, DMA and region freeing must not be done from IRQ ...
    (Linux-Kernel)
  • [git Patch 1/1] avoid IRQ0 ioapic pin collision
    ... that patch introduced a work-around for a VIA chipset ... Our system uses an ACPI Interrupt Source Override to inform the OS that the ... the gsi up to the current value of pci_irq. ... and the IRQ is assigned to another unrelated device. ...
    (Linux-Kernel)
  • [(repost) git Patch 1/1] avoid IRQ0 ioapic pin collision
    ... I have made slight changes to the patch I ... Our system uses an ACPI Interrupt Source Override to inform the OS that the ... the gsi up to the current value of pci_irq. ... and the IRQ is assigned to another unrelated device. ...
    (Linux-Kernel)