Re: [PATCH 1/1] megaraid_{mm,mbox}: fix a bug in reset handler



"Ju, Seokmann" <Seokmann.Ju@xxxxxxxx> wrote:

This patch has fix for a bug in the 'megaraid_reset_handler()'.

When abort failed, the driver gets reset handleer called. In the reset
handler, driver calls 'scsi_done()' callback for same SCSI command
packet (struct scsi_cmnd) multiple times if there are multiple SCSI
command packet in the pend_list. More over, if there are entry in the
pend_lsit with IOCTL packet associated, the driver returns it to wrong
free_list so that, in turn, the driver could end up with 'NULL pointer
dereference..' during I/O command building with incorrect resource.

Also, the patch contains several minor/cosmetic changes besides this.

..

@@ -2655,32 +2655,48 @@
// Also, reset all the commands currently owned by the driver
spin_lock_irqsave(PENDING_LIST_LOCK(adapter), flags);
list_for_each_entry_safe(scb, tmp, &adapter->pend_list, list) {
-
list_del_init(&scb->list); // from pending list

- con_log(CL_ANN, (KERN_WARNING
- "megaraid: %ld:%d[%d:%d], reset from pending list\n",
- scp->serial_number, scb->sno,
- scb->dev_channel, scb->dev_target));
+ if (scb->sno >= MBOX_MAX_SCSI_CMDS) {
+ con_log(CL_ANN, (KERN_WARNING
+ "megaraid: IOCTL packet with %d[%d:%d] being reset\n",
+ scb->sno, scb->dev_channel, scb->dev_target));

- scp->result = (DID_RESET << 16);
- scp->scsi_done(scp);
+ scb->status = -EFAULT;

What is the significance of -EFAULT here? Seems inappropriate?

@@ -2918,12 +2933,12 @@
wmb();
WRINDOOR(raid_dev, raid_dev->mbox_dma | 0x1);

- for (i = 0; i < 0xFFFFF; i++) {
+ for (i = 0; i < 0xFFFFFF; i++) {
if (mbox->numstatus != 0xFF) break;
rmb();
}

Oh my. That's an awfully long interrupts-off spin. 1.7e7 operations with
an NMI watchdog timeout of five seconds - I'm surprised it doesn't trigger.

Is that reading from a PCI register there? Or main memory?

I'm somewhat surprised that the compiler never "optimises" this into a
lockup, actually. That's what `volatile' is for.

Is it not possible to do this with an interrupt?

A `cpu_relax()' in that loop would help cool things down a bit.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [Fastboot] [RFC] [PATCH 2/2] kdump: cciss driver initialization issue fix
    ... just coming out of reset) and if so issue a reset. ... driver thinks that there is on reasons for me receiving ... a command completion message for a command which I never issued and ... Resetting the devices during driver initialization can be a costly ...
    (Linux-Kernel)
  • [PATCH]: Documentation: Updated PCI Error Recovery
    ... PCI error handling doc. ... PCI Error Recovery ... -errors, and to be notified of, and respond to, a reset sequence. ... +including multiple instances of a device driver on multi-function ...
    (Linux-Kernel)
  • Re: pci error recovery procedure
    ... I am reviewing the error handlers of e1000 driver and got some ideas. ... for a full electical reset of the card, instead, it wants to perform ... error_resume could take care of multi-function card. ...
    (Linux-Kernel)
  • Re: pci error recovery procedure
    ... I am reviewing the error handlers of e1000 driver and got some ideas. ... for a full electical reset of the card, instead, it wants to perform ... error_resume could take care of multi-function card. ...
    (Linux-Kernel)
  • POSTing of video cards (WAS: Solo Xgl..)
    ... > Secondary cards needs reset. ... So this means that we have to run the VBIOS reset before probe ... responsibility of the driver to do what it needs at probe time or later. ... If a driver detects that it's card ...
    (Linux-Kernel)