Re: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Jens Axboe <jens.axboe@xxxxxxxxxx>
- Date: Wed, 19 Nov 2008 18:29:19 +0100
On Wed, Nov 19 2008, Miller, Mike (OS Dev) wrote:
-----Original Message-----
From: Randy Dunlap [mailto:randy.dunlap@xxxxxxxxxx]
Sent: Wednesday, November 19, 2008 11:23 AM
To: Miller, Mike (OS Dev)
Cc: Jens Axboe; scsi; James Bottomley; lkml; akpm
Subject: Re: in 2.6.23-rc3-git7 in do_cciss_intr
Miller, Mike (OS Dev) wrote:
removeQ(). On the
-----Original Message-----
From: Jens Axboe [mailto:jens.axboe@xxxxxxxxxx]
Sent: Wednesday, November 19, 2008 2:52 AM
To: Randy Dunlap
Cc: scsi; Miller, Mike (OS Dev); James Bottomley; lkml; akpm
Subject: Re: in 2.6.23-rc3-git7 in do_cciss_intr
On Tue, Nov 18 2008, Randy Dunlap wrote:
Randy Dunlap wrote:/home/rdunlap/linsrc/linux-2.6.27-rc3-git7/drivers/block/cciss.h:
Randy Dunlap wrote:
Miller, Mike (OS Dev) wrote:
-----Original Message-----
From: Randy Dunlap [mailto:randy.dunlap@xxxxxxxxxx]
Sent: Thursday, September 25, 2008 3:40 PM
To: scsi
Cc: Jens Axboe; Miller, Mike (OS Dev); James Bottomley; lkml;
akpm
Subject: Re: in 2.6.23-rc3-git7 in do_cciss_intr
On Thu, 25 Sep 2008 13:33:07 -0700 Randy Dunlap wrote:
Jens Axboe wrote:<do_cciss_intr+2509>
On Thu, Sep 04 2008, Miller, Mike (OS Dev) wrote:
0x3bb2 <do_cciss_intr+1649>: mov 0x2(%r8),%dx
0x3bb7 <do_cciss_intr+1654>: test %dx,%dx
0x3bba <do_cciss_intr+1657>: je 0x3f0e
$ addr2line -e cciss.o -f do_cciss_intr+0x627
SA5_fifo_full
getting into the2
ctrlr_info_t06
OK ...that's confusing. It seems to be saying that
* was NULL. However, I can't see a way of
fifo_full doesn'tfifo_full
callback from do_cciss_intr ..That is weird. Even if we could get there
especially not with an NULL host.
James
2.6.27-rc5-git3.do anything but wait for a bit.
Hi,
This just happened again. This time it's on
and it'sThis looks somewhat strange, mostly like 'c' is NULL~RandyThanks Randy. I think. :)
I'll try to recreate in my lab.
frequency thanThis BUG: has happened (now) 5 times today. Highercorrect inoopsing in in removeQ (I don't think Randy's analysis is
cannot beassuming it's 'h' and it's in fifo_full). Given that 'c'
NULL, it's c->prev or c->next that are NULL.
usual for some reason.
I enabled CCISS_DEBUG and added one printk in
command =0000:42:08.0:I added a printk() in addQ() as well. Here's the new output:first calls/first/second/
to removeQ(), both c->next and c->prev are NULL.
Here's the kernel log output from cciss:
HP CISS Driver (v 3.6.20)
ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 54 cciss
PCI INT A -> Link[LNKA] -> GSI 54 (level, high) -> IRQ 54
Controller147 irq = 36 board_id = 3211103c cciss 0000:42:08.0: irq 87 forbase address
MSI/MSI-X address 0 = fdf80000 cfg base address = 10 cfg
index = 0 cfg offset = 400 Controller Configuration information
------------------------------------
Signature = CISS
Spec Number = 1
Transport methods supported = 0x6
Transport methods active = 0x3
Requested transport Method = 0x0
Coalesce Interrupt Delay = 0x0
Coalesce Interrupt Count = 0x1
Max outstanding commands = 0x256
Bus Types = 0x200000
Server Name =
Heartbeat Counter = 0x1672
Trying to put board into Simple mode I counter got to 1 0
list, the QptrConfiguration information7f83e000 - down
------------------------------------
Signature = CISS
Spec Number = 1
Transport methods supported = 0x6
Transport methods active = 0x3
Requested transport Method = 0x0
Coalesce Interrupt Delay = 0x0
Coalesce Interrupt Count = 0x1
Max outstanding commands = 0x256
Bus Types = 0x200000
Server Name =
Heartbeat Counter = 0x1672
cciss0: <0x3238> at PCI 0000:42:08.0 IRQ 87 using DAC
cciss: intr_pending 8
cciss: addQ: Qptr=ffff88027e0100b8, c=ffff88007f83e000
cciss: removeQ: Qptr=ffff88027e0100b8, c=ffff88007f83e000,
next=ffff88007f83e000, prev=ffff88007f83e000 Sending
to controllerRandy, can you post the debug patch you used? The above goes boom
cciss: addQ: Qptr=ffff88027e0100c0, c=ffff88007f83e000
cciss: intr_pending 8
cciss: Read 4 back from board
cciss: removeQ: Qptr=ffff88027e0100c0, c=ffff88007f840000,
next=0000000000000000, prev=0000000000000000
BUG: unable to handle kernel NULL pointer dereference at
0000000000000248
when it attempts to remove a command that isn't on the
assuming it'sin the last example should be empty, hence the oops. So I'd be
interested in seeing what removeQ() calls this is, I'm
that does:this bit in
do_cciss_intr():
...
while (c->busaddr != a) {
c = c->next;
if (c == h->cmpQ)
break;
}
}
/*
* If we've found the command, take it off the
* completion Q and free it
*/
if (c->busaddr == a) {
removeQ(&h->cmpQ, c);
if (c->cmd_type == CMD_RWREQ) {
complete_command(h, c, 0);
...
If so, what part of the c lookup are you hitting - the on
on a BL465c w/e200i. Just to confirm, you only see this atRandy,
c = h->cmd_pool + a2;
or the c->busaddr check that his shown above?
--
I still can't reproduce this bug. I have your config file
init time, correct?
Yes, only at init time.
Please post your debug patch as Jens requested.
Done (separately).
I need to back up a bit. Yesterday these BUGs happened
consistenly, so I wondered why. Then I recalled that for
debugging another bug/problem, I had changed the test
system's normal boot kernel from 2.6.25 to 2.6.18-8. The
test system is used to build and then boot the new kernel
*via kexec*, so it's quite possible (or certain) that
something in the kexec world has been fixed since 2.6.18. I
don't recall seeing this problem lately when using 2.6.25 to
kexec/boot the new test kernel, so I'm quite willing to drop
the bug for now and then re-open it if I see the problem again. OK??
Ahhhh, the kexec piece was missing. Now I don't feel quite so
clueless. I'm OK with dropping the bug for now. Jens, James?
Yeah, kexec is definitely a clue. My guess is that we got some sort of
left over completion. Regardless of the status of this particular bug or
not, I think it would be a good idea to add some checks for when a
command is attempted removed from a queue it isn't currently on.
--
Jens Axboe
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- RE: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Miller, Mike (OS Dev)
- RE: in 2.6.23-rc3-git7 in do_cciss_intr
- References:
- Re: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Randy Dunlap
- Re: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Randy Dunlap
- Re: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Randy Dunlap
- Re: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Jens Axboe
- RE: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Miller, Mike (OS Dev)
- Re: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Randy Dunlap
- RE: in 2.6.23-rc3-git7 in do_cciss_intr
- From: Miller, Mike (OS Dev)
- Re: in 2.6.23-rc3-git7 in do_cciss_intr
- Prev by Date: Re: [PATCH] [WATCHDOG] [hpwdt] Set the mapped BIOS address space as executable
- Next by Date: Re: [PATCH] [WATCHDOG] [hpwdt] Set the mapped BIOS address space as executable
- Previous by thread: RE: in 2.6.23-rc3-git7 in do_cciss_intr
- Next by thread: RE: in 2.6.23-rc3-git7 in do_cciss_intr
- Index(es):
Relevant Pages
|