Is my RAID dying?



my raid is spitting errors into the log and whatnot. I am trying to figure out the problem but am not entirely sure how to go about it. Can you interpret this info?

__________
output from '/proc/mdstat'
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdg1[1]
39097152 blocks [2/1] [_U]


I thought this means one disk as been removed. For the last few days I though it meant hdg since hdg is spitting the erros in the log. Unfortunately I think now it means hde is removed and hdg is the last leg :o



______________

output from 'lsraid -a /dev/md0'
[dev 9, 0] /dev/md0 00000000.00000000.00000000.00000000 online
[dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing

I don't know what this means?





________________

output from 'tail -f /var/log/messages'
Apr 10 13:40:56 erasmus kernel: hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Apr 10 13:40:56 erasmus kernel: hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=30408783, sector=30408720
Apr 10 13:40:56 erasmus kernel: end_request: I/O error, dev 22:01 (hdg), sector 30408720
Apr 10 13:40:56 erasmus kernel: raid1: hdg1: rescheduling block 30408720
Apr 10 13:40:56 erasmus kernel: raid1: hdg1: unrecoverable I/O read error for block 30408720
Apr 10 13:40:56 erasmus kernel: EXT3-fs error (device md(9,0)): ext3_get_inode_loc: unable to read inode block
- inode=1896839, block=3801090



This is RH9 on an ABIT NV7-133R (nforce) board I believe with external graphics). This seems to indicate hdg is barfing. I thought I had installed smartctl, but I guess I didn't. Or perhaps its on the part of hdg that is having trouble :(








___________________
This is from dmesg

---SNIP---
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 00000057]
[events: 00000076]
md: autorun ...
md: considering hdg1 ...
md: adding hdg1 ...
md: adding hde1 ...
md: created md0
md: bind<hde1,1>
md: bind<hdg1,2>
md: running: <hdg1><hde1>
md: hdg1's event counter: 00000076
md: hde1's event counter: 00000057
md: superblock update time inconsistency -- using the most recent one
md: freshest: hdg1
md: kicking non-fresh hde1 from array!
md: unbind<hde1,1>
md: export_rdev(hde1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.
md: unbind<hdg1,0>
md: export_rdev(hdg1)
md: ... autorun DONE.

--SNIP--

md: kicking non-fresh hde1 from array!
md: unbind<hde1,1>
md: export_rdev(hde1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device hdg1 operational as mirror 1
raid1: md0, not all disks are operational -- trying to recover array
raid1: raid set md0 active with 1 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hdg1 [events: 00000077]<6>(write) hdg1's sb offset: 39097152
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: md(9,0): orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 2224027
ext3_orphan_cleanup: deleting unreferenced inode 2224011
ext3_orphan_cleanup: deleting unreferenced inode 212772


DAMN!! this is from last august when my computer restarted. I was having trouble getting it to boot, but after a few days It booted and I just assumed all was well. My uptime is 236 days. I know some files are damaged, probably why my printer is not working anymore.

Anyway to tell which files are damaged?

I just restarted hde with hotadd. Anyone have any insight into significant points in this log that indicates stuff? Is there a RAID newsgroup maybe?


I'm upgrading to FC5 probably next weekend if I make it, with 2 new drives...Ill use the new raid tools as well, and hopefully smartctl. I'm going to need am email or something when raid is barfing from now on. DAMN I try and try to remain ignorant, but you just gotta darn near become a sysadmin to keep yoru stuff safe.

--
Thank you,



"Then said I, Wisdom [is] better than strength: nevertheless the poor man's wisdom [is] despised, and his words are not heard." Ecclesiastes 9:16
.



Relevant Pages

  • Re: Is my RAID dying?
    ... kicking non-fresh hde1 from array! ... RAID level 1 does not need chunksize! ... md: md0 stopped. ... recovery thread got woken up ... ...
    (linux.redhat)
  • DiskInternals Raid Recovery 1.0
    ... Recover corrupted RAID arrays in a fully automatic mode. ... Raid Recovery ... RAID array while still allowing for fully manual operation. ... raids (also called Dynamic Disks) are also supported, ...
    (comp.software.shareware.announce)
  • Re: Need help with corrupted RAID array
    ... I don't know for sure which is damaged, this is my first system using a RAID ... array. ... > Have you run a chkdsk yet from the Recovery Console? ... >> was damaged and I cannot boot to Windows or safe mode. ...
    (microsoft.public.windowsxp.general)
  • softraid1 questions
    ... md: autorun ... ... md: considering hdb1 ... ... RAID level 1 does not need chunksize! ... md: md0 stopped. ...
    (comp.os.linux.hardware)
  • Re: create raid1 on a installed system
    ... Create the raid on sda1 with the other disk 'missing' ... Copy sdb1 to md0, remove sdb altogether and reboot to md0, and test to ... the raid array and after a long wait, ...
    (Ubuntu)