Re: 3ware 9550 using 3w-9xxx driver lockups ?



adam radford wrote:

Looks to me like you have a bad drive (or several bad drives, if
other drives on other machines are reallocating sectors as well).

Many reallocated sectors could be a sign that your drive is about to die...
Are you booting off this disk?

Have you run smartmontools on those bad disks?

You fail to mention your raid configuration.

Are you running the latest 3ware firmware for that controller? There
is a Linux specific userspace firmware update utility.

Also, this topic is better served on the linux-scsi email list.

-Adam

On 8/8/06, Andy Davidson <andy@xxxxxxxxxxxx> wrote:



Hi,

One of our servers locked up (no kernel panic, but no response to
console/network) with the following output on console :

Aug 8 14:10:34 how-mail-1 kernel: 3w-9xxx: scsi0: AEN: WARNING
(0x04:0x0023): Sector repair completed:port=1,LBA=0x4383A7F.
Aug 8 14:10:39 how-mail-1 kernel: 3w-9xxx: scsi0: AEN: WARNING
(0x04:0x0023): Sector repair completed:port=1,LBA=0x438C4D9.
Aug 8 14:10:41 how-mail-1 kernel: 3w-9xxx: scsi0: AEN: WARNING
(0x04:0x0023): Sector repair completed:port=1,LBA=0x438C4DC.
Aug 8 14:10:47 how-mail-1 kernel: 3w-9xxx: scsi0: AEN: WARNING
(0x04:0x0023): Sector repair completed:port=1,LBA=0x438FB34.
Aug 8 14:10:49 how-mail-1 kernel: 3w-9xxx: scsi0: AEN: WARNING
(0x04:0x0023): Sector repair completed:port=1,LBA=0x438FB36.
Aug 8 14:10:52 how-mail-1 kernel: 3w-9xxx: scsi0: AEN: WARNING
(0x04:0x0023): Sector repair completed:port=1,LBA=0x439051D.
Aug 8 14:11:08 how-mail-1 kernel: 3w-9xxx: scsi0: AEN: WARNING
(0x04:0x0023): Sector repair completed:port=1,LBA=0x4378D8A.

I'd put this down to bad hardware until an identically spec'd machine
(with fresh disks) did the same thing.


Anyone else noticed any issues using the newer 3-ware 9550S cards with
the 3w-9xxx driver ?


Other details :
Linux how-mail-1 2.6.16-2-686-smp #1 SMP Sat Jul 15 22:33:00 UTC 2006 i686

from lspci:
03:03.0 RAID bus controller: 3ware Inc 9550SX SATA-RAID


3ware 9000 Storage Controller device driver for Linux v2.26.02.007.
ACPI: PCI Interrupt 0000:03:03.0[A] -> GSI 24 (level, low) -> IRQ 177
scsi0 : 3ware 9000 Storage Controller
3w-9xxx: scsi0: Found a 3ware 9000 Storage Controller at 0xda300000,
IRQ: 177.
3w-9xxx: scsi0: Firmware FE9X 3.01.01.028, BIOS BE9X 3.01.00.024, Ports:
4. Vendor: AMCC Model: 9550SX-4LP DISK
Rev: 3.01 Type: Direct-Access ANSI SCSI revision: 03f


We'd welcome any help ?

Thanks,
Andy
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Adam,

Have him check the cables as well unless he using the multi-lane controller. I see this a lot with loose ATA cables
on units, but since we moved over to the multi-lane architecture -- great work on the multi-lane version of the adapters, BTW.
He should also check is he's getting -2 bio errors. I see -2 bio errors in parallel with this if its a bad cable.

Jeff
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: DriveStatusError BadCRC
    ... between the disk and the controller. ... I'd recommend either trying a ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: Any news on a higher performance sata_sil SIL_QUIRK_MOD15WRITE workaround?
    ... > Get a different controller + disk combination. ... I don't want to dump the disks, but I may be ordering a 3ware controller ... I thought that the Seagate ST3160023AS was a fairly new drive. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • Re: [RFC][PATCH 2.4] EDD 4-byte MBR disk signature for the boot disk
    ... >> Can you remind me why you want to retrieve the boot disk? ... One may configure the system to have multiple disk ... > controller on which they want to build their database files. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)
  • PROBLEM: HT1000 drops network packets during disk writes
    ... HT1000 drops network packets during disk writes ... _any_ disk writes are done on onboard SATA controller at the ... All tested kernel versions are affected. ...
    (Linux-Kernel)
  • Re: SCO OS 5.0.7 on Qemu
    ... The OSR5 "wd" IDE driver has code in it to ... > controller used in PC/AT class machines), ... > the kernel to look for a SCSI disk on a "wd" HBA. ... Possible UDMA timing mismatch, ...
    (comp.unix.sco.misc)