Re: ahci + software raid (intel E7221, supermicro P8SCT) causes kernel BUG at drivers/scsi/scsi.c:295






I've narrowed this down to a faulty SATA drive. It performed at 1/4th the
speed of my other drives, and eventually would disintegrate and remain
non-responsive for long periods of time.

Still, what happened below is probably not the desired behaviour in
response to a dead drive. :/

On Thu, 15 Dec 2005 11:21:55 -0500
Brad Barnett <lists@xxxxxxx> wrote:

>
> Is there any additional information anyone would like? Any suggestions
> on this bug? I currently have a reproducible problem, on a system I can
> monkey with for the next week or two.
>
> After that, it will have to move into production, and I won't be able to
> do much with it. :/
>
> Thanks
>
>
> -------------------------------
>
>
> Supermicro P8SCT:
>
> http://supermicro.com/products/motherboard/P4/E7221/P8SCT.cfm
>
> (uses intel E7221/ahci for SATA controller). Hot-swap raid cage. Three
> SATA ports in use. Also have two MegaRAID SATA 150-6 onboard, one in
> PCI-X slot.
>
> This box will run stable, but at inconsistent times lock up. After
> hooking up serial console, we discovered the problem.
>
> FYI, we have three different motherboards from Supermicro, and they all
> exhibit the same issue... it is unlikely a fault with our specific
> hardware.
>
> It is easiest to trigger this bug (not certain if this is the only way)
> when running software raid, two drives, in stripe mode. Many times this
> configuration will cause no isses, other times I get this:
>
> ata2: handling error/timeout
> ata2: port reset, p_is 0 is 0 pis 0 cmd 4017 tf d0 ss 113 se 0
> ata2: status=0x50 { DriveReady SeekComplete }
> sdb: Current: sense key=0x0
> ASC=0x0 ASCQ=0x0
> Assertion failed! qc->flags &
> ATA_QCFLAG_ACTIVE,drivers/scsi/libata-core.c,ata_qc_complete,line=351
> 3
> ------------[ cut here ]------------
> kernel BUG at drivers/scsi/scsi.c:295!
> invalid operand: 0000 [#1]
> Modules linked in:
> CPU: 0
> EIP: 0060:[<c02d54c1>] Not tainted VLI
> EFLAGS: 00010046 (2.6.15-rc2)
> EIP is at scsi_put_command+0x8b/0x95
> eax: f7f69d90 ebx: f7f98680 ecx: f7f9868c edx: f7f9868c
> esi: f7f8c000 edi: 00000296 ebp: f7f69c00 esp: f7ff7c0c
> ds: 007b es: 007b ss: 0068
> Process scsi_eh_3 (pid: 858, threadinfo=f7ff6000 task=f7ea9030)
> Stack: f7f69df8 c0225720 f7f98680 f7f69d90 f7f62dfc f7f62dfc c02d9cdf
> f7f98680 efadda7c f7f98680 00000282 c02d9e05 f7f98680 00000001
> 00000000 efadda7c f7f98680 00000000 f7f98680 c02da0df f7f98680
> 00000001 00000000 00000001 Call Trace:
> [<c0225720>] kobject_get+0x1a/0x24
> [<c02d9cdf>] scsi_next_command+0x2f/0x4f
> [<c02d9e05>] scsi_end_request+0xc4/0xd6
> [<c02da0df>] scsi_io_completion+0x19a/0x41e
> [<c036448e>] sd_rw_intr+0xc6/0x26d
> [<c02d5bc7>] scsi_finish_command+0x82/0xa2
> [<c035b708>] ata_scsi_qc_complete+0x47/0x8d
> [<c0358eb0>] ata_qc_complete+0x40/0xc6
> [<c035cfd9>] ahci_interrupt+0x107/0x1e3
> [<c0112089>] activate_task+0x6d/0x80
> [<c01321d1>] handle_IRQ_event+0x2e/0x64
> [<c013225a>] __do_IRQ+0x53/0xa5
> [<c011d606>] update_process_times+0x7b/0x106
> [<c010455c>] do_IRQ+0x19/0x24
> [<c0102fda>] common_interrupt+0x1a/0x20
> [<c0119bc3>] __do_softirq+0x2f/0x8a
> [<c0119c44>] do_softirq+0x26/0x28
> [<c0104561>] do_IRQ+0x1e/0x24
> [<c0102fda>] common_interrupt+0x1a/0x20
> [<c02daa73>] scsi_request_fn+0x1a9/0x2c2
> [<c0218c27>] blk_run_queue+0x3a/0x3c
> [<c02d9ce7>] scsi_next_command+0x37/0x4f
> [<c02d9e05>] scsi_end_request+0xc4/0xd6
> [<c02da0df>] scsi_io_completion+0x19a/0x41e
> [<c036448e>] sd_rw_intr+0xc6/0x26d
> [<c02d5bc7>] scsi_finish_command+0x82/0xa2
> [<c035b708>] ata_scsi_qc_complete+0x47/0x8d
> [<c0358eb0>] ata_qc_complete+0x40/0xc6
> [<c035cea6>] ahci_eng_timeout+0x85/0xb0
> [<c02d91fb>] scsi_error_handler+0x0/0x99
> [<c035af32>] ata_scsi_error+0x1a/0x31
> [<c02d925a>] scsi_error_handler+0x5f/0x99
> [<c012706d>] kthread+0xb1/0xb7
> [<c0126fbc>] kthread+0x0/0xb7
> [<c0101329>] kernel_thread_helper+0x5/0xb
> Code: 5c 24 08 8b 74 24 0c 89 44 24 1c 8b 7c 24 10 8b 6c 24 14 83 c4 18
> e9 cd b1 fc ff 89 43 0c 89 48 04 31 db 89 51 04 89 4e 14 eb b7 <0f> 0b
> 27 01 3f ce 47 c0 eb 95 57
> <0>Kernel panic - not syncing: Fatal exception in interrupt
>
>
>
>
> Put another way, I can setup, prep and create a software raid.. format
> it.. and start to move data onto it. I can even use it for quite some
> time. However, once the above happens once, I can not use that software
> raid again until I wipe out the raid and start fresh. Merely booting
> with the raid in causes the above problem on bootup...
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-kernel"
> in the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: RAID Controller & Info
    ... The issue appears to be with the SATA Driver responding to an IO Control Request that should only be handled by ... "Returns information about the geometry of floppy drives. ... The setup code actually believes that the RAID volume ... | ATA controllers with or without SW/firmware RAID. ...
    (microsoft.public.windows.server.sbs)
  • Re: CD writing in future Linux (stirring up a hornets nest)
    ... How do you share a raid between two systems? ... I remember systems setup that way with scsi in the ... Intel part #82544 in hard drives: ... Considering that USB 1.0 used a shielded cable, it seems unlikely that SATA ...
    (Linux-Kernel)
  • Re: Possable Solution to CERC SATA Raid 2S STOP error 0x0000007b RAID
    ... will see I built the first SATA drive onto the Second SATA drive to ... RAID1 with 2 disks later on, if you turn on the RAID option in the BIOS ... To skip the process of imaging of drives to the IDE and back to the ...
    (microsoft.public.windowsxp.hardware)
  • Re: Software RAID or File Copy program?
    ... I have a web server with 3 hard drives as laid out below, and basically I would like to know what is the best way to keep a backup copy of the 2nd drive? ... Hardware RAID is out of the budget, and I had figured that Software RAID would be too much overhead for a busy webserver - but now it looks like RAID would be the best way to ensure a constant backup copy of even live files so, any advice? ...
    (microsoft.public.windows.server.general)
  • Re: Cant install xp on my sata drive
    ... I have 2 IDE hard drives and now I want to add a 250GB Seagate SATA drive ... IDE drives as additional storage. ... RAID drivers(I tried all of them in the list, RAID as well as AHCI, ICH7, ...
    (microsoft.public.windowsxp.setup_deployment)