Is my RAID dying?
- From: dnoyeB <Fake@xxxxxxxxxxxxxxxxx>
- Date: Tue, 11 Apr 2006 16:10:35 -0400
my raid is spitting errors into the log and whatnot. I am trying to figure out the problem but am not entirely sure how to go about it. Can you interpret this info?
__________
output from '/proc/mdstat'
Personalities : [raid1]
read_ahead 1024 sectors
md0 : active raid1 hdg1[1]
39097152 blocks [2/1] [_U]
I thought this means one disk as been removed. For the last few days I though it meant hdg since hdg is spitting the erros in the log. Unfortunately I think now it means hde is removed and hdg is the last leg :o
______________
output from 'lsraid -a /dev/md0'
[dev 9, 0] /dev/md0 00000000.00000000.00000000.00000000 online
[dev ?, ?] (unknown) 00000000.00000000.00000000.00000000 missing
I don't know what this means?
________________
output from 'tail -f /var/log/messages'
Apr 10 13:40:56 erasmus kernel: hdg: dma_intr: status=0x51 { DriveReady SeekComplete Error }
Apr 10 13:40:56 erasmus kernel: hdg: dma_intr: error=0x40 { UncorrectableError }, LBAsect=30408783, sector=30408720
Apr 10 13:40:56 erasmus kernel: end_request: I/O error, dev 22:01 (hdg), sector 30408720
Apr 10 13:40:56 erasmus kernel: raid1: hdg1: rescheduling block 30408720
Apr 10 13:40:56 erasmus kernel: raid1: hdg1: unrecoverable I/O read error for block 30408720
Apr 10 13:40:56 erasmus kernel: EXT3-fs error (device md(9,0)): ext3_get_inode_loc: unable to read inode block
- inode=1896839, block=3801090
This is RH9 on an ABIT NV7-133R (nforce) board I believe with external graphics). This seems to indicate hdg is barfing. I thought I had installed smartctl, but I guess I didn't. Or perhaps its on the part of hdg that is having trouble :(
___________________
This is from dmesg
---SNIP---
md: md driver 0.90.0 MAX_MD_DEVS=256, MD_SB_DISKS=27
md: Autodetecting RAID arrays.
[events: 00000057]
[events: 00000076]
md: autorun ...
md: considering hdg1 ...
md: adding hdg1 ...
md: adding hde1 ...
md: created md0
md: bind<hde1,1>
md: bind<hdg1,2>
md: running: <hdg1><hde1>
md: hdg1's event counter: 00000076
md: hde1's event counter: 00000057
md: superblock update time inconsistency -- using the most recent one
md: freshest: hdg1
md: kicking non-fresh hde1 from array!
md: unbind<hde1,1>
md: export_rdev(hde1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
kmod: failed to exec /sbin/modprobe -s -k md-personality-3, errno = 2
md: personality 3 is not loaded!
md :do_md_run() returned -22
md: md0 stopped.
md: unbind<hdg1,0>
md: export_rdev(hdg1)
md: ... autorun DONE.
--SNIP--
md: kicking non-fresh hde1 from array!
md: unbind<hde1,1>
md: export_rdev(hde1)
md: md0: raid array is not clean -- starting background reconstruction
md: RAID level 1 does not need chunksize! Continuing anyway.
md0: max total readahead window set to 124k
md0: 1 data-disks, max readahead per data-disk: 124k
raid1: device hdg1 operational as mirror 1
raid1: md0, not all disks are operational -- trying to recover array
raid1: raid set md0 active with 1 out of 2 mirrors
md: updating md0 RAID superblock on device
md: hdg1 [events: 00000077]<6>(write) hdg1's sb offset: 39097152
md: recovery thread got woken up ...
md0: no spare disk to reconstruct array! -- continuing in degraded mode
md: recovery thread finished ...
md: ... autorun DONE.
EXT3-fs: INFO: recovery required on readonly filesystem.
EXT3-fs: write access will be enabled during recovery.
kjournald starting. Commit interval 5 seconds
EXT3-fs: md(9,0): orphan cleanup on readonly fs
ext3_orphan_cleanup: deleting unreferenced inode 2224027
ext3_orphan_cleanup: deleting unreferenced inode 2224011
ext3_orphan_cleanup: deleting unreferenced inode 212772
DAMN!! this is from last august when my computer restarted. I was having trouble getting it to boot, but after a few days It booted and I just assumed all was well. My uptime is 236 days. I know some files are damaged, probably why my printer is not working anymore.
Anyway to tell which files are damaged?
I just restarted hde with hotadd. Anyone have any insight into significant points in this log that indicates stuff? Is there a RAID newsgroup maybe?
I'm upgrading to FC5 probably next weekend if I make it, with 2 new drives...Ill use the new raid tools as well, and hopefully smartctl. I'm going to need am email or something when raid is barfing from now on. DAMN I try and try to remain ignorant, but you just gotta darn near become a sysadmin to keep yoru stuff safe.
--
Thank you,
"Then said I, Wisdom [is] better than strength: nevertheless the poor man's wisdom [is] despised, and his words are not heard." Ecclesiastes 9:16
.
- Follow-Ups:
- Re: Is my RAID dying?
- From: dnoyeB
- Re: Is my RAID dying?
- Prev by Date: Re: swap partition size
- Next by Date: Re: Redhat 9 Disk Druid
- Previous by thread: newbie: using alias, and have it save my settings
- Next by thread: Re: Is my RAID dying?
- Index(es):
Relevant Pages
|