Barracuda ST31000528AS problem



Hi All,

I have two Seagate Barracuda 7200.12 1 TB (ST31000528AS) drives in a Linux software RAID-1 configuration. Today I've got a notification from smartd that one of the drives (sda) is failing:

Device: /dev/sda, ATA error count increased from 0 to 6

Some other log messages (like: "ata1.00: cmd ... Emask 0x409 (media error)", "end_request: I/O error, dev sda, sector 39072000") and the disk's SMART error log seem to confirm that the disk is dying. My problem is that I'm seeing SMART warnings about the other drive too:

smartd[5845]: Device: /dev/sdb, SMART Prefailure Attribute: 1 Raw_Read_Error_Rate changed from 108 to 117

Below is the listing of SMART attributes for the good drive (smartctl -A /dev/sdb):

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 113 099 006 Pre-fail Always - 52634145
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 56
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 24
7 Seek_Error_Rate 0x000f 075 060 030 Pre-fail Always - 35530576
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3861
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 56
183 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 100 100 000 Old_age Always - 0
188 Unknown_Attribute 0x0032 100 099 000 Old_age Always - 1
189 High_Fly_Writes 0x003a 099 099 000 Old_age Always - 1
190 Airflow_Temperature_Cel 0x0022 067 059 045 Old_age Always - 33 (Lifetime Min/Max 32/41)
194 Temperature_Celsius 0x0022 033 041 000 Old_age Always - 33 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 036 015 000 Old_age Always - 52634145
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 91955249811373
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 1261294398
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 1519044357

And here is the listing for the bad drive:

=== START OF READ SMART DATA SECTION ===
SMART Attributes Data Structure revision number: 10
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000f 109 100 006 Pre-fail Always - 23028010
3 Spin_Up_Time 0x0003 095 095 000 Pre-fail Always - 0
4 Start_Stop_Count 0x0032 100 100 020 Old_age Always - 59
5 Reallocated_Sector_Ct 0x0033 100 100 036 Pre-fail Always - 17
7 Seek_Error_Rate 0x000f 078 060 030 Pre-fail Always - 81078197
9 Power_On_Hours 0x0032 096 096 000 Old_age Always - 3861
10 Spin_Retry_Count 0x0013 100 100 097 Pre-fail Always - 0
12 Power_Cycle_Count 0x0032 100 100 020 Old_age Always - 59
183 Unknown_Attribute 0x0000 100 100 000 Old_age Offline - 0
184 Unknown_Attribute 0x0032 100 100 099 Old_age Always - 0
187 Reported_Uncorrect 0x0032 094 094 000 Old_age Always - 6
188 Unknown_Attribute 0x0032 100 096 000 Old_age Always - 26
189 High_Fly_Writes 0x003a 100 100 000 Old_age Always - 0
190 Airflow_Temperature_Cel 0x0022 070 062 045 Old_age Always - 30 (Lifetime Min/Max 29/38)
194 Temperature_Celsius 0x0022 030 040 000 Old_age Always - 30 (0 19 0 0)
195 Hardware_ECC_Recovered 0x001a 041 022 000 Old_age Always - 23028010
197 Current_Pending_Sector 0x0012 100 100 000 Old_age Always - 0
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age Offline - 0
199 UDMA_CRC_Error_Count 0x003e 200 200 000 Old_age Always - 0
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 82240033787773
241 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 2371531202
242 Unknown_Attribute 0x0000 100 253 000 Old_age Offline - 3348144171

Both have a nonzero Reallocated_Sector_Ct and Seek_Error_Rate.
I cannot run an extended SMART test on the drive as due to some firmware problem it doesn't move past 10% completion.

Do you think the other drive is failing also?

Thanks!

--
Peter Szymański <szyman(at)magres.net>
.



Relevant Pages

  • Re: Partition Size/performance
    ... When a drive goes offline, ... >it's reported as Free in the BIOS level raid utility, ... >> Microsoft MVP - Windows Server Networking ... >>>> for four SATA drives that will fit in two 5½" slots. ...
    (microsoft.public.windows.server.general)
  • Re: Offline files and dead server
    ... It then says I am offline, ... > tho the other synced drives seem to be syncing OK. ... > the choices appear in the sync manager screen, ... >>> How can we assist you with Windows XP? ...
    (microsoft.public.windowsxp.general)
  • Re: "One or more disks are failing" ?
    ... |> Vendor Specific SMART Attributes with Thresholds: ... I added "trust us" because the ... On some Seagate drives, people have figured out that certain raw ...
    (Fedora)
  • Re: Off Line Folders NO Applied from GPO..
    ... I would normally do those as User policies. ... Does the Computer have permission to access those drives. ... Really the only point of a home drive now is to have a user folder that is NOT available offline, ... Synhronize all offile files when logging off - Enable ...
    (microsoft.public.windows.group_policy)
  • Re: Barracuda ST31000528AS problem
    ... I have two Seagate Barracuda 7200.12 1 TB drives in a Linux software RAID-1 configuration. ... Vendor Specific SMART Attributes with Thresholds: ... Offline - 0 ...
    (comp.os.linux.hardware)