Re: [opensuse] Filesystem corrupted or what?



David Brodbeck wrote:
jan kalcic wrote:

In addition, during the wait, the hard disk sounds like it is doing such
a big work that it's so busy it can't even move my mouse. Quite strange.

Could it even be an hardware (hard disk) problem?



That seems very likely. Often when I have that kind of behavior it's
because there are bad sectors on the disk and it's having to try
repeatedly to read data. You may even see I/O errors in your syslog.

I would run the SMART self-test, if it's supported -- see the 'smartctl'
manpage for details. Most likely your disk is dying.

So hardware problem, right? :(

It reports some errors but it seems to be always the same one and I'm
not sure they are what you guess.

smartctl version 5.37 [i686-suse-linux-gnu] Copyright (C) 2002-6 Bruce Allen
Home page is http://smartmontools.sourceforge.net/

=== START OF INFORMATION SECTION ===
Model Family: Hitachi Travelstar 5K80 family
Device Model: HTS548060M9AT00
Serial Number: MRLB55L4HZNEHC
Firmware Version: MGBOA5EA
User Capacity: 60,011,642,880 bytes
Device is: In smartctl database [for details use: -P show]
ATA Version is: 6
ATA Standard is: ATA/ATAPI-6 T13 1410D revision 3a
Local Time is: Mon Apr 2 02:30:22 2007 CEST
SMART support is: Available - device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status: (0x85) Offline data collection activity
was aborted by an interrupting
command from host.
Auto Offline Data Collection:
Enabled.
Self-test execution status: ( 0) The previous self-test routine
completed
without error or no self-test
has ever
been run.
Total time to complete Offline
data collection: ( 645) seconds.
Offline data collection
capabilities: (0x5b) SMART execute Offline immediate.
Auto Offline data collection
on/off support.
Suspend Offline collection upon new
command.
Offline surface scan supported.
Self-test supported.
No Conveyance Self-test supported.
Selective Self-test supported.
SMART capabilities: (0x0003) Saves SMART data before entering
power-saving mode.
Supports SMART auto save timer.
Error logging capability: (0x01) Error logging supported.
General Purpose Logging supported.
Short self-test routine
recommended polling time: ( 2) minutes.
Extended self-test routine
recommended polling time: ( 46) minutes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE
UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x000b 084 084 062 Pre-fail
Always - 2097461
2 Throughput_Performance 0x0005 100 100 040 Pre-fail
Offline - 604
3 Spin_Up_Time 0x0007 148 148 033 Pre-fail
Always - 2
4 Start_Stop_Count 0x0012 100 100 000 Old_age
Always - 808
5 Reallocated_Sector_Ct 0x0033 095 095 005 Pre-fail
Always - 0
7 Seek_Error_Rate 0x000b 100 100 067 Pre-fail
Always - 0
8 Seek_Time_Performance 0x0005 100 100 040 Pre-fail
Offline - 0
9 Power_On_Hours 0x0012 096 096 000 Old_age
Always - 2120
10 Spin_Retry_Count 0x0013 100 100 060 Pre-fail
Always - 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age
Always - 800
191 G-Sense_Error_Rate 0x000a 098 098 000 Old_age
Always - 65540
192 Power-Off_Retract_Count 0x0032 100 100 000 Old_age
Always - 63
193 Load_Cycle_Count 0x0012 093 093 000 Old_age
Always - 73182
194 Temperature_Celsius 0x0002 141 141 000 Old_age
Always - 39 (Lifetime Min/Max 7/52)
196 Reallocated_Event_Count 0x0032 100 100 000 Old_age
Always - 376
197 Current_Pending_Sector 0x0022 100 100 000 Old_age
Always - 201
198 Offline_Uncorrectable 0x0008 100 100 000 Old_age
Offline - 0
199 UDMA_CRC_Error_Count 0x000a 200 200 000 Old_age
Always - 0

SMART Error Log Version: 1
ATA Error Count: 21303 (device log contains only the most recent five
errors)
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Powered_Up_Time is measured from power on, and printed as
DDd+hh:mm:SS.sss where DD=days, hh=hours, mm=minutes,
SS=sec, and sss=millisec. It "wraps" after 49.710 days.

Error 21303 occurred at disk power-on lifetime: 2118 hours (88 days + 6
hours)
When the command that caused the error occurred, the device was active
or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 3c 62 65 e2 Error: UNC 3 sectors at LBA = 0x0265623c = 40198716

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 37 62 65 e0 00 07:44:17.200 READ DMA EXT
e7 00 00 00 00 00 a0 00 07:44:11.600 FLUSH CACHE
e7 00 00 00 00 00 a0 00 07:44:11.600 FLUSH CACHE
35 00 08 97 0a 4c e0 00 07:44:11.600 WRITE DMA EXT
e7 00 00 00 00 00 a0 00 07:44:11.600 FLUSH CACHE

Error 21302 occurred at disk power-on lifetime: 2118 hours (88 days + 6
hours)
When the command that caused the error occurred, the device was active
or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 3c 62 65 e2 Error: UNC 3 sectors at LBA = 0x0265623c = 40198716

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 37 62 65 e0 00 07:32:18.000 READ DMA EXT
35 00 10 97 1c 4a e0 00 07:32:18.000 WRITE DMA EXT
35 00 08 f8 3a d3 e0 00 07:32:18.000 WRITE DMA EXT
35 00 10 e0 a4 2c e0 00 07:32:18.000 WRITE DMA EXT
35 00 08 37 1f a2 e0 00 07:32:18.000 WRITE DMA EXT

Error 21301 occurred at disk power-on lifetime: 2118 hours (88 days + 6
hours)
When the command that caused the error occurred, the device was active
or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 3c 62 65 e2 Error: UNC 3 sectors at LBA = 0x0265623c = 40198716

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 37 62 65 e0 00 07:32:14.100 READ DMA EXT
35 00 08 5f b7 a5 e0 00 07:32:12.800 WRITE DMA EXT
35 00 08 cf b6 a5 e0 00 07:32:12.800 WRITE DMA EXT
35 00 20 3f 1f a2 e0 00 07:32:12.800 WRITE DMA EXT
e7 00 00 00 00 00 a0 00 07:32:08.400 FLUSH CACHE

Error 21300 occurred at disk power-on lifetime: 2118 hours (88 days + 6
hours)
When the command that caused the error occurred, the device was active
or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 3c 62 65 e2 Error: UNC 3 sectors at LBA = 0x0265623c = 40198716

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 37 62 65 e0 00 07:24:15.800 READ DMA EXT
25 00 08 37 62 65 e0 00 07:24:11.900 READ DMA EXT
35 00 08 28 a9 ca e0 00 07:24:11.000 WRITE DMA EXT
e7 00 00 00 00 00 a0 00 07:24:06.200 FLUSH CACHE
e7 00 00 00 00 00 a0 00 07:24:06.200 FLUSH CACHE

Error 21299 occurred at disk power-on lifetime: 2118 hours (88 days + 6
hours)
When the command that caused the error occurred, the device was active
or idle.

After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
40 51 03 3c 62 65 e2 Error: UNC 3 sectors at LBA = 0x0265623c = 40198716

Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Powered_Up_Time Command/Feature_Name
-- -- -- -- -- -- -- -- ---------------- --------------------
25 00 08 37 62 65 e0 00 07:24:11.900 READ DMA EXT
35 00 08 28 a9 ca e0 00 07:24:11.000 WRITE DMA EXT
e7 00 00 00 00 00 a0 00 07:24:06.200 FLUSH CACHE
e7 00 00 00 00 00 a0 00 07:24:06.200 FLUSH CACHE
35 00 08 d7 eb 4b e0 00 07:24:06.200 WRITE DMA EXT

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed without error 00%
1 -
# 2 Short offline Completed without error 00%
0 -

Warning! SMART Selective Self-Test Log Structure error: invalid SMART
checksum.
SMART Selective self-test log data structure revision number 1
SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS
1 0 0 Not_testing
2 0 0 Not_testing
3 0 0 Not_testing
4 0 0 Not_testing
5 0 0 Not_testing
Selective self-test flags (0x0):
After scanning selected spans, do NOT read-scan remainder of disk.
If Selective self-test is pending on power-up, resume after 0 minute delay.




--
To unsubscribe, e-mail: opensuse+unsubscribe@xxxxxxxxxxxx
For additional commands, e-mail: opensuse+help@xxxxxxxxxxxx



Relevant Pages

  • Device: /dev/hda, ATA error count increased from 0 to 2
    ... Self-test execution status: The previous self-test routine completed ... CR = Command Register [HEX] ... FR = Features Register [HEX] ...
    (Debian-User)
  • Re: gmirror slice insertion, "FAILURE - READ_DMA status=51"
    ... Native Command Queuing yes - 31/0x1F ... SMART support is: Available - device has SMART capability. ... Offline data collection status: ... Self-test execution status: The previous self-test routine completed ...
    (freebsd-questions)
  • Re: File system corruption with ATA RAID-1 on 6-STABLE
    ... SMART support is: Available - device has SMART capability. ... without error or no self-test has ever been run. ... Suspend Offline collection upon new command. ...
    (freebsd-stable)
  • IDE HDD fail?
    ... This caused after this command: ... Device contains neither a valid DOS partition table, nor Sun, SGI or OSF ... Offline data collection status: ... Self-test execution status: ...
    (Linux-Kernel)
  • F9: smartd errors, how to fix it?
    ... Offline data collection status: ... Self-test execution status: The previous self-test completed having ... CR = Command Register [HEX] ... FR = Features Register [HEX] ...
    (Fedora)