Re: FailSafe Event (md)?
- From: Allen Kistler <ackistler@xxxxxxxxx>
- Date: Thu, 11 Jan 2007 17:12:34 -0600
Mike wrote:
[snip]
On this machine I execute:
-------------------
$ cat /proc/mdstat
Personalities : [raid5] [raid4] [raid1]
md0 : active raid1 sdf1[2](S) sde1[3](S) sdd1[4](S) sdc1[5](S) sdb1[1] sda1[0]
104320 blocks [2/2] [UU]
md1 : active raid1 sdf3[2](S) sde3[3](S) sdd3[4](S) sdc3[5](S) sdb3[1] sda3[0]
3068288 blocks [2/2] [UU]
md2 : active raid5 sdf2[4] sde2[5](F) sdd2[3] sdc2[2] sdb2[1] sda2[0]
560732160 blocks level 5, 256k chunk, algorithm 2 [5/5] [UUUUU]
unused devices: <none>
-------------------
Does the email message mean drive sde2[5] has failed? I know the sde2 refers
to the second partition of /dev/sde. Here is the partition table
Yes. That's why there's [F] by it.
I have partition 2 of drive sde as one of the raid devices for md. Does the (S)
on sde3[2](S) mean the device is a spare for md1 and the same for md0?
Yes.
Mike wrote:
On 2007-01-11, stan@xxxxxxxxxxxxxxxxxx <stan@xxxxxxxxxxxxxxxxxx> wrote:
Sounds like one of your hard drives died and was replaced
by a hot spare within the raid.
In which case you'll want to yank the bad disk and replace it
as you likely no longer have a hot spare.
Raid 5 can get along without a spare, but with degraded performance.
I've never gotten the email you got, but FailSpare does sound like a
failover to a spare, as opposed to failover to parity.
madam -QD /dev/md2
(query with details) may help interpret the data for you.
With AIX boxes I can blink a drive. Is there a way to blink the LED's
of a drive in linux? This is a dell box running Fedora Core 5 with
recent patches.
If you mean hot swapping, it depends on the controller HW. Hot swapping
might be supported, but probably not. If you remove the drive, you must
first manually fail the other partitions on the drive, too.
mdadm /dev/md0 -f /dev/sde1
madam /dev/md1 -f /dev/sde3
Then remove the partitions.
mdadm /dev/md0 -r /dev/sde1
mdadm /dev/md1 -r /dev/sde3
mdadm /dev/md2 -r /dev/sde2
Then swap out the hardware and partition the new drive. I would also
recommend reconsidering your raid scheme. To have multiple arrays on
the same drives guarantees that one drive failure/replacement affects
multiple arrays, like now. Depending on how much downtime (rebooting
and rebooting and rebooting) you can endure, you might be able to make
some changes.
Once the drive is replaced, will md automatically detect the replacement
and rebuild as necessary?
After partitioning, you would add the new partitions back into the
array(s) with
mdadm <array device> -a <partition device>
mdadm handles the rest.
.
- References:
- FailSafe Event (md)?
- From: Mike
- Re: FailSafe Event (md)?
- From: stan
- Re: FailSafe Event (md)?
- From: Mike
- FailSafe Event (md)?
- Prev by Date: Re: how to automatically run a script each day?
- Next by Date: Re: apt-get list
- Previous by thread: Re: FailSafe Event (md)?
- Next by thread: Re: FailSafe Event (md)?
- Index(es):
Relevant Pages
|