Re: Is my RAID dying?



dnoyeB wrote:
my raid is spitting errors into the log and whatnot. I am trying to figure out the problem but am not entirely sure how to go about it. Can you interpret this info?

__________
output from '/proc/mdstat'
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdg1[1]
39097152 blocks [2/1] [_U]


I thought this means one disk as been removed. For the last few days I though it meant hdg since hdg is spitting the erros in the log. Unfortunately I think now it means hde is removed and hdg is the last leg :o



______________

output from 'lsraid -a /dev/md0'
[dev 9, 0] /dev/md0 00000000.00000000.00000000.00000000 online
[dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing

I don't know what this means?





________________

output from 'tail -f /var/log/messages'
Apr 10 13:40:56 erasmus kernel: hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Apr 10 13:40:56 erasmus kernel: hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=30408783, sector=30408720
Apr 10 13:40:56 erasmus kernel: end_request: I/O error, dev 22:01 (hdg), sector 30408720
Apr 10 13:40:56 erasmus kernel: raid1: hdg1: rescheduling block 30408720
Apr 10 13:40:56 erasmus kernel: raid1: hdg1: unrecoverable I/O read error for block 30408720
Apr 10 13:40:56 erasmus kernel: EXT3-fs error (device md(9,0)): ext3_get_inode_loc: unable to read inode block
- inode=1896839, block=3801090



This is RH9 on an ABIT NV7-133R (nforce) board I believe with external graphics). This seems to indicate hdg is barfing. I thought I had installed smartctl, but I guess I didn't. Or perhaps its on the part of hdg that is having trouble :(








___________________
This is from dmesg

---SNIP---
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 00000057]
[events: 00000076]
md: autorun ...
md: considering hdg1 ...
md: adding hdg1 ...
md: adding hde1 ...
md: created md0
md: bind<hde1,1>
md: bind<hdg1,2>
md: running: <hdg1><hde1>
md: hdg1's event counter: 00000076
md: hde1's event counter: 00000057
md: superblock update time inconsistency -- using the most recent one
md: freshest: hdg1
md: kicking non-fresh hde1 from array!
md: unbind<hde1,1>
md: export_rdev(hde1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.
md: unbind<hdg1,0>
md: export_rdev(hdg1)
md: ... autorun DONE.

--SNIP--

md: kicking non-fresh hde1 from array!
md: unbind<hde1,1>
md: export_rdev(hde1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device hdg1 operational as mirror 1
raid1: md0, not all disks are operational -- trying to recover array
raid1: raid set md0 active with 1 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hdg1 [events: 00000077]<6>(write) hdg1's sb offset: 39097152
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: md(9,0): orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 2224027
ext3_orphan_cleanup: deleting unreferenced inode 2224011
ext3_orphan_cleanup: deleting unreferenced inode 212772


DAMN!! this is from last august when my computer restarted. I was having trouble getting it to boot, but after a few days It booted and I just assumed all was well. My uptime is 236 days. I know some files are damaged, probably why my printer is not working anymore.

Anyway to tell which files are damaged?

I just restarted hde with hotadd. Anyone have any insight into significant points in this log that indicates stuff? Is there a RAID newsgroup maybe?


I'm upgrading to FC5 probably next weekend if I make it, with 2 new drives...Ill use the new raid tools as well, and hopefully smartctl. I'm going to need am email or something when raid is barfing from now on. DAMN I try and try to remain ignorant, but you just gotta darn near become a sysadmin to keep yoru stuff safe.


i successfully got the hde1 readded to array by 1130 last night. 4am this morning hdg1 gave up the ghost :P Im right back to 1 disk in the array. At least I learned how to read the logs...

--
Thank you,



"Then said I, Wisdom [is] better than strength: nevertheless the poor man's wisdom [is] despised, and his words are not heard." Ecclesiastes 9:16
.



Relevant Pages

  • Re: create raid1 on a installed system
    ... Create the raid on sda1 with the other disk 'missing' ... Copy sdb1 to md0, remove sdb altogether and reboot to md0, and test to ... the raid array and after a long wait, ...
    (Ubuntu)
  • Re: create raid1 on a installed system
    ... software raid disk md0? ... Create the raid on sda1 with the other disk 'missing' ... Copy sdb1 to md0, remove sdb altogether and reboot to md0, and test to ... the raid array and after a long wait, ...
    (Ubuntu)
  • DiskInternals Raid Recovery 1.0
    ... Recover corrupted RAID arrays in a fully automatic mode. ... Raid Recovery ... RAID array while still allowing for fully manual operation. ... raids (also called Dynamic Disks) are also supported, ...
    (comp.software.shareware.announce)
  • degraded array - former device is unavailable
    ... I've problem with software RAID 1. ... removing from array! ... I can see that only one disk is active. ... Do you have any idea why is md0 not created correctly? ...
    (Debian-User)
  • Is my RAID dying?
    ... my raid is spitting errors into the log and whatnot. ... kicking non-fresh hde1 from array! ... md: md0 stopped. ... recovery thread got woken up ... ...
    (linux.redhat)