Re: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) r0xj0



On Wed, 3 Jan 2007, Tejun Heo wrote:
bbee wrote:
Yeap, I have major issues with SDB FISes which contains spurious
completions but most other spurious interrupts shouldn't be dangerous
and I haven't seen spurious completions for quite some time, so I was
thinking either removing the message or printing it only on SDB FIS
containing spurious completions.

But, Andrew Lyon *is* reporting spurious completions. Now I just wanna
update those printks such that more info is reported only on spurious
SDB FISes.

That would certainly help verify that I'm having the exact same problem,
since Andrew didn't say anything about his drive going offline.

Okay.

Sorry, I thought you meant you would need to update it *further*. I applied the patch you gave to Andrew with this result so far:

$ dmesg | grep -A1 "spurious interrupt"
ata1: spurious interrupt (irq_stat 0x8 active_tag 0xfafbfcfd sactive 0x0)
ata1: issue=0x0 SAct=0x0 SDB_FIS=004040a1:00000008
--
ata1: spurious interrupt (irq_stat 0x8 active_tag 0xfafbfcfd sactive 0x0)
ata1: issue=0x0 SAct=0x0 SDB_FIS=004040a1:00000001

No luck yet triggering the exception.

On Wed, 3 Jan 2007, Andrew Lyon wrote:
Alan said he was going to add the drive to a blacklist he was
maintaining for NCQ, perhaps that has been done in kernel 2.6.19, I
dont know as I am still running 2.6.18.

Perhaps the WD Raptor drive that I have does have lousy NCQ and that
explains both the poor performance and the spurious interrupts.

Blacklisting NCQ on the drive(s) for all controllers might be ill advised, since it could be a JMicron-specific issue (or ahci-specific, since the person in the thread I referenced had a different ahci controller..). Either that, or both our drive models have "lousy NCQ"..


Thanks,

bbee
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Kernel 2.6.23.9 / P35 Chipset + WD 750GB Drives (reset port)
    ... 0x2 (HSM violation) ... I will turn off NCQ and try to make the problem re-occur. ... The problem won't recur with NCQ off, because spurious completions are impossible in that case. ... It was originally thought that these AHCI spurious NCQ completions were busted NCQ implementations on the drives, but I think there theory is that it's some other timing problem or some such, given the number of drives across all makers which are reported to do this. ...
    (Linux-Kernel)
  • Re: ata1: spurious interrupt (irq_stat 0x8 active_tag -84148995 sactive 0x0) r0xj0
    ... completions but most other spurious interrupts shouldn't be dangerous ... and I haven't seen spurious completions for quite some time, ... Andrew Lyon *is* reporting spurious completions. ... Follows at end of message (md init snipped from dmesg for brevity). ...
    (Linux-Kernel)