device driver wake up problem on Ubuntu 7.04 (kernel 2.6.20)



I have a problem with a PCI card character device driver that uses
dma.

We are using wait_event_interruptible_timeout() to block inside the
driver read() until a dma transfer completes. We call
wake_up_interruptible() in the interrupt handler to wake up the read
process on dma completion.
The read process actually uses a number of separate threads but a
semaphore in the driver ensures that there can only ever be one thread
on the wait queue at any time.

Everything works fine when using Ubuntu 6.06 (kernel 2.6.15) runing on
an Intel core duo processor.

We are now attempting to install the software onto a system that uses
an Intel quad core processor.
We found that the Ubuntu 6.06 installer would not run on this
processor, so rebuilt our code for Ubuntu 7.04 (kernel 2.6.20).

We now find that the call to wait_event_interruptible_timeout()
frequently fails to return after wake_up_interruptible() has been
called from the interrupt handler.
Eventually, after a hundred milliseconds or so, the hardware reports a
buffer overrun.

It does not fail every time - usually we get several successful dma
transfers before it hangs.
This suggests a timing/race problem.

I have added a few printk()'s in the driver to confirm that the dma
completion interrupt is being received and that
wake_up_interruptible() has been called from the ISR. A printk()
immediately before and after the call to
wait_event_interruptible_timeout() confirms that the read process
remains stuck on the wait queue.
I have tried adding a call to waitqueue_active() just before the call
to wake_up_interruptible() in the interrupt handler just to check that
the read process is ready and waiting on the queue.
Thinking that it may be an SMP issue, I tried printing the cpu number
in the ISR and the read process. This turned out to be 0 in all cases.

At the moment we are unsure whether the problem is related to the
kernel change from 2.6.15 to 2.6.20 or to the change from a duo core
to a quad core cpu.
It is possible that we have a timing problem in the driver that only
manifests itself with this combination.
We are hoping to try the Ubuntu 7.04 build on a dual-core processor
next week to try and pin the problem down further.

In the mean time, is anyone aware of any timing problems or SMP
pitfalls in this area that we should be aware of?

Regards,

Brian.

.



Relevant Pages

  • dma timeout
    ... my *old hdd* got up with dma mode. ... PIIX4: IDE controller at PCI slot 00:04.1 ... ide0 at 0x1f0-0x1f7,0x3f6 on irq 14 ... hda: attached ide-disk driver. ...
    (Debian-User)
  • Re: [RFC v2 2/5] dmaengine: Add slave DMA interface
    ... DMA engine driver is free to decide on its own. ... with, than slave transfers, which may be quite slow depending on the ...
    (Linux-Kernel)
  • Re: [PATCH] Blackfin: blackfin on-chip SPI controller driver
    ... patch, rather than initial-plus-cleanups. ... Please put this in Kconfig up with the other SPI controller drivers, ... relevant points in the driver. ... place to reverse any DMA mappings ... ...
    (Linux-Kernel)
  • [PATCH RFC v5] net: add PCINet driver
    ... PCI bus as its transport mechanism. ... RFC v4 -> RFC v5: ... use seperate DMA channels for RX and TX ... Thanks to all of those who have posted comments about the driver. ...
    (Linux-Kernel)
  • Re: pci error recovery procedure
    ... DMA on a per-slot basis, ... enables DMA before some other driver has reset appropriately, ... If one driver of a multi-function card enables DMA before ... The pSeries platform error recovery procedure can only enable DMA ...
    (Linux-Kernel)