[SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please
From: Simon Roberts (thorpflyer_at_yahoo.com)
Date: 10/10/05
- Previous message: Steve Graegert: "Re: [SLE] 64 bit Java"
- In reply to: Simon Roberts: "Re: [SLE] Help with disk integrity and RAID-1 please"
- Next in thread: Carlos E. R.: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Reply: Carlos E. R.: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Maybe reply: Simon Roberts: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Maybe reply: Stephen Carter: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Sun, 9 Oct 2005 22:55:25 -0700 (PDT) To: Michael W Cocke <cocke@catherders.com>, suse-linux-e@suse.com
Silly me, when I rub the sleep out of my eyes, and do a long test, no,
the disk is indeed dying. It reported happy before I told it to do any
explicit tests, then again after a short test, but part way through a
long test, it's complaining of seek errors, and says it has only a day
to live.
Pretty cool utility the SMART stuff though! Ideal for managing an array
and preemptively replacing stuff before it's too late.
Thanks,
Simon
--- Simon Roberts <thorpflyer@yahoo.com> wrote:
> Following another post pointing out the existence of the smartctl
> test
> interface, it looks as if this drive of mine might actually be ok. Is
> there any possibility that I screwed up the configuration and, in
> effect, switched off the other drive from the RAID array, rather than
> it being taken down for errors? If I did, how might I get it back,
> can
> I just zero its contents and add it to the array again? And any
> pointer
> as to the command(s) to re-add it? (I know how to use dd to zero it).
>
> TIA,
> Simon
>
>
> --- Michael W Cocke <cocke@catherders.com> wrote:
>
> > On Sat, 8 Oct 2005 09:26:28 -0700 (PDT), you wrote:
> >
> > >Please forgive me if this shows up twice, I tried to send once but
> > it
> > >has taken an improbable time and still not shown up, so it's time
> to
> > >try again.
> > >
> > >Following a premature (3 months) disk failure, I created a RAID 1
> > >array. I understand the basic idea of RAID, but have never used
> the
> > >tools to do it before (not on Linux, not on anything).
> > >
> > >As I built it, I knew there were many things I didn't know about,
> > but
> > >hoped I could learn slowly in "spare" time. For example: does RAID
> > move
> > >bad blocks on it's elements, or does it just dump the doubtful
> > device?
> > >If RAID finds a disk problem, does it tell me about it, and if so
> > how?
> > >If RAID rejects a device, particularly if it's for "transient"
> > reasons
> > >like a single bad sector, can I re-prepare the disk manually and
> get
> > it
> > >back into service. If I have to replace a failed disk, how do I do
> > >that?
> > >
> > >Anyway, these questions are still unanswered (after about 3
> > months...)
> > >and guess what: I'm pretty sure I have a drive failure. It makes
> odd
> > >noises, like the other one did :( I poked around, and managed to
> > work
> > >out the existance of the mdadm command, and found this:
> > >
> > ># mdadm --detail /dev/md0
> > >/dev/md0:
> > > Version : 00.90.01
> > > Creation Time : Thu Sep 1 05:49:50 2005
> > > Raid Level : raid1
> > > Array Size : 156280192 (149.04 GiB 160.03 GB)
> > > Device Size : 156280192 (149.04 GiB 160.03 GB)
> > > Raid Devices : 2
> > > Total Devices : 1
> > >Preferred Minor : 0
> > > Persistence : Superblock is persistent
> > >
> > > Update Time : Sat Oct 8 09:38:25 2005
> > > State : clean, degraded
> > > Active Devices : 1
> > >Working Devices : 1
> > > Failed Devices : 0
> > > Spare Devices : 0
> > >
> > > UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c
> > > Events : 0.1345011
> > >
> > > Number Major Minor RaidDevice State
> > > 0 0 0 - removed
> > > 1 34 1 1 active sync /dev/hdg1
> > >
> > >I don't really know what I'm looking at, but the output looks bad,
> > >right?
> > >
> > >I also found this in dmesg's output:
> > >
> > >
> > >md: Autodetecting RAID arrays.
> > >md: autorun ...
> > >md: considering hdg1 ...
> > >md: adding hdg1 ...
> > >md: adding hde1 ...
> > >md: created md0
> > >md: bind<hde1>
> > >md: bind<hdg1>
> > >md: running: <hdg1><hde1>
> > >md: kicking non-fresh hde1 from array!
> > >md: unbind<hde1>
> > >md: export_rdev(hde1)
> > >raid1: raid set md0 active with 1 out of 2 mirrors
> > >md: ... autorun DONE.
> > >
> > >Which also looks bad, don't you think?
> > >
> > >So, can anyone please tell me in the short term:
> > >
> > >1) Is hde indeed out of the array as it appears?
> >
> > Yes.
> >
> > >2) How can I determine what the failure is? (is it "a few" bad
> > sectors,
> > >too many to want to reuse the drive, or a more complete failure)
> >
> > There is no such thing as a 'partial drive failure' on an IDE
> drive.
> > Bad sector marking/remapping is handled via the on board electrics
> -
> > if the alternate sector map is full, the drive is a short time away
> > from complete failure. Since you describe odd noises, you don't
> even
> > need to worry about that - it's junk.
> >
> > >3) Can I reformat, move bad sectors, clean up the drive (if it's a
> > >minor failure) and get it back into service, and if so how?
> >
> > See #2 above.
> >
> > >4) If I elect/have to replace the drive, what do I do to make it
> > take
> > >up it's ordained place in the md array?
> >
> > Power down the system, replace the drive, power up the system. The
> > only real recovery headache with a RAID is if the boot drive is the
> > one that failed... In that case, you need to have made certain
> that
> > ALL the disks are bootable (lilo can do that, I don't know about
> > grub), or else have an alternate boot method.
> >
> > >Then in the longer term, where should I be looking for the docs so
> I
> > >can know this for myself in future?
> >
> > All of the docs on the linux software raid system that I've seen
> are
> > lousy... The code is still evolving, and it seems to be being
> > written
> > by people who aren't into docs. O'Reily has 'Managing RAID on
> linux'
> > which isn't too bad but IS inaccurate in places. The way I did it
> was
> > to put together a junk system and try things, meanwhile reading
> > everything google found on 'linux raid'. A real pain, but it's
> your
> > data...
> >
> > Mike-
> >
> > --
> > Mornings: Evolution in action. Only the grumpy will survive.
> > --
> >
> > Please note - Due to the intense volume of spam, we have installed
> > site-wide spam
> > filters at catherders.com. If email from you bounces, try
> non-HTML,
> > non-encoded,
> > non-attachments.
> >
> >
> > --
> > Check the headers for your unsubscription address
> > For additional commands send e-mail to suse-linux-e-help@suse.com
> > Also check the archives at http://lists.suse.com
> > Please read the FAQs: suse-linux-e-faq@suse.com
> >
> >
> >
>
>
> "You can tell whether a man is clever by his answers. You can tell
> whether a man is wise by his questions." — Naguib Mahfouz
>
>
>
>
> __________________________________
> Yahoo! Mail - PC Magazine Editors' Choice 2005
> http://mail.yahoo.com
>
> --
> Check the headers for your unsubscription address
> For additional commands send e-mail to suse-linux-e-help@suse.com
> Also check the archives at http://lists.suse.com
> Please read the FAQs: suse-linux-e-faq@suse.com
>
>
>
"You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." — Naguib Mahfouz
__________________________________
Yahoo! Music Unlimited
Access over 1 million songs. Try it free.
http://music.yahoo.com/unlimited/
-- Check the headers for your unsubscription address For additional commands send e-mail to suse-linux-e-help@suse.com Also check the archives at http://lists.suse.com Please read the FAQs: suse-linux-e-faq@suse.com
- Previous message: Steve Graegert: "Re: [SLE] 64 bit Java"
- In reply to: Simon Roberts: "Re: [SLE] Help with disk integrity and RAID-1 please"
- Next in thread: Carlos E. R.: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Reply: Carlos E. R.: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Maybe reply: Simon Roberts: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Maybe reply: Stephen Carter: "Re: [SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]