Re: software raid: replace failing HD

From: bombadil (me_at_privacy.net)
Date: 02/06/04


Date: Fri, 06 Feb 2004 11:43:41 +0100

John-Paul Stewart wrote:

> Are they 80-pin SCA drives in a hot swap chassis? If not, then I
> wouldn't recommend you try hot swapping normal SCSI drives (50- or
> 68-pin SCSI connector with seperate power connector).

Here are the specifications of the disk in question:
<http://www.support.gateway.com/s/Servers/COMPO/HARDDRV/Seagate/5502564/5502564sp2.shtml>

I am not sure if it is hot swappable, probably it will be better to shut
down the server anyway.

> If the hardware is indeed hot swappable, then make sure the RAID
> software has marked the failed drive as failed ('man mdadm' for info, in
> particular the "--fail" and "--remove" parameters), pull it out and put
> in the new one. Then partition the drive as needed and add it to the
> array ('mdadm --add /dev/md0 /dev/sda1' for example) and let the
> software RAID resync itself.

Here are the kernel messages about the failing drive:

Feb 1 11:28:02 myservername kernel: SCSI disk error : host 0 channel 0
id 1 lun 0 return code = 8000002
Feb 1 11:28:02 myservername kernel: Info fld=0x6f7454, Current sd08:15:
sense key Hardware Error
Feb 1 11:28:02 myservername kernel: Additional sense indicates
Mechanical positioning error
Feb 1 11:28:02 myservername kernel: I/O error: dev 08:15, sector 26768
Feb 1 11:28:02 myservername kernel: raid1: Disk failure on sdb5,
disabling device.
Feb 1 11:28:02 myservername kernel: ^IOperation continuing on 1 devices
Feb 1 11:28:02 myservername kernel: md: updating md3 RAID superblock on
device
Feb 1 11:28:02 myservername kernel: md: (skipping faulty sdb5 )
Feb 1 11:28:02 myservername kernel: md: sda5 [events:
0000007e]<6>(write) sda5's sb offset: 14137088
Feb 1 11:28:02 myservername kernel: md: recovery thread got woken up ...
Feb 1 11:28:02 myservername kernel: md3: no spare disk to reconstruct
array! -- continuing in degraded mode

Here is the current raid status:

Personalities : [raid1]
read_ahead 1024 sectors
md2 : active raid1 sdb1[1] sda1[0]
       48064 blocks [2/2] [UU]

md0 : active raid1 sdb2[1] sda2[0]
       3068288 blocks [2/2] [UU]

md1 : active raid1 sdb3[1] sda3[0]
       522048 blocks [2/2] [UU]

md3 : active raid1 sdb5[1](F) sda5[0]
       14137088 blocks [2/1] [U_]

unused devices: <none>

Only a partition (sdb5) has been marked as failed and removed from the
raid array (md3).

If I got it right, I should mark the other partitions (sdb1, sdb2 and
sdb3) on the failing disk (sdb) as failed and remove them from the raid
arrays (md2, md0 and md1).

Then I shut down the server, replace the disk, start up the server,
partition the new disk exactly as the other one (what is the best way to
do that?), add the partitions to the raid arrays and let the arrays
resync themselves.

Is that right?

What is the advantage of using mdadm over raidtools?

BTW, probably the new disk will be bigger (32GB instead of 18GB) than
the current one (though same RPM speed and same SCSI technology), is
there any problem with that?

Thanks in advance.



Relevant Pages

  • LVM on SW RAID for sarge - success
    ... I've just had success migrating Debian Sarge to root on LVM on RAID. ... First make sure the SCSI BIOS boots from the first disk. ... I made reiserfs on all the filesystem partitions and completed the ... The remaining two disks were partitioned and used to create RAID arrays ...
    (Debian-User)
  • Re: Building a file server - advice please
    ... > connecting up 5 drives in a RAID5 system does not affect the Mean Time To ... > important reason to use a RAID system. ... Hardware controllers generally can have an additional spare disk configured ... Hardware raid presents each raid array to the host as one disk, ...
    (comp.os.linux.setup)
  • Is it HighPoint, is it Seagate, or is it Windows 2000 Professional
    ... Then I tried to boot my system: Windows reported ... Barracuda disk running off a HighPoint HPT370A RAID ... never new that my drives were set up this way. ...
    (microsoft.public.win2000.setup)
  • Re: Is it HighPoint, is it Seagate, or is it Windows 2000 Professional
    ... As you've discovered Raid 0 has no fail safe redundancy, ... Then I tried to boot my system: Windows reported ... Barracuda disk running off a HighPoint HPT370A RAID ... never new that my drives were set up this way. ...
    (microsoft.public.win2000.setup)
  • Re: Paul and Old Man: Cannot fix RAID5 failure ...
    ... all of the sudden the Intel Matrix Storage ROM showed Rebuild status. ... When booting form RAID, there is some activity while the screen is black, ... Considering 3 SATA drives spining up after powerdown may cause a 12V ... I was considering a parallel WinXP installation on a 4th disk, ...
    (alt.comp.periphs.mainboard.asus)

Loading