[SLE] OOps, never mind: Re: [SLE] Help with disk integrity and RAID-1 please

From: Simon Roberts (thorpflyer_at_yahoo.com)
Date: 10/10/05

  • Next message: Basil Chupin: "[SLE] TVTIME 1.0.1"
    Date: Sun, 9 Oct 2005 22:55:25 -0700 (PDT)
    To: Michael W Cocke <cocke@catherders.com>, suse-linux-e@suse.com
    
    

    Silly me, when I rub the sleep out of my eyes, and do a long test, no,
    the disk is indeed dying. It reported happy before I told it to do any
    explicit tests, then again after a short test, but part way through a
    long test, it's complaining of seek errors, and says it has only a day
    to live.

    Pretty cool utility the SMART stuff though! Ideal for managing an array
    and preemptively replacing stuff before it's too late.

    Thanks,
    Simon

    --- Simon Roberts <thorpflyer@yahoo.com> wrote:

    > Following another post pointing out the existence of the smartctl
    > test
    > interface, it looks as if this drive of mine might actually be ok. Is
    > there any possibility that I screwed up the configuration and, in
    > effect, switched off the other drive from the RAID array, rather than
    > it being taken down for errors? If I did, how might I get it back,
    > can
    > I just zero its contents and add it to the array again? And any
    > pointer
    > as to the command(s) to re-add it? (I know how to use dd to zero it).
    >
    > TIA,
    > Simon
    >
    >
    > --- Michael W Cocke <cocke@catherders.com> wrote:
    >
    > > On Sat, 8 Oct 2005 09:26:28 -0700 (PDT), you wrote:
    > >
    > > >Please forgive me if this shows up twice, I tried to send once but
    > > it
    > > >has taken an improbable time and still not shown up, so it's time
    > to
    > > >try again.
    > > >
    > > >Following a premature (3 months) disk failure, I created a RAID 1
    > > >array. I understand the basic idea of RAID, but have never used
    > the
    > > >tools to do it before (not on Linux, not on anything).
    > > >
    > > >As I built it, I knew there were many things I didn't know about,
    > > but
    > > >hoped I could learn slowly in "spare" time. For example: does RAID
    > > move
    > > >bad blocks on it's elements, or does it just dump the doubtful
    > > device?
    > > >If RAID finds a disk problem, does it tell me about it, and if so
    > > how?
    > > >If RAID rejects a device, particularly if it's for "transient"
    > > reasons
    > > >like a single bad sector, can I re-prepare the disk manually and
    > get
    > > it
    > > >back into service. If I have to replace a failed disk, how do I do
    > > >that?
    > > >
    > > >Anyway, these questions are still unanswered (after about 3
    > > months...)
    > > >and guess what: I'm pretty sure I have a drive failure. It makes
    > odd
    > > >noises, like the other one did :( I poked around, and managed to
    > > work
    > > >out the existance of the mdadm command, and found this:
    > > >
    > > ># mdadm --detail /dev/md0
    > > >/dev/md0:
    > > > Version : 00.90.01
    > > > Creation Time : Thu Sep 1 05:49:50 2005
    > > > Raid Level : raid1
    > > > Array Size : 156280192 (149.04 GiB 160.03 GB)
    > > > Device Size : 156280192 (149.04 GiB 160.03 GB)
    > > > Raid Devices : 2
    > > > Total Devices : 1
    > > >Preferred Minor : 0
    > > > Persistence : Superblock is persistent
    > > >
    > > > Update Time : Sat Oct 8 09:38:25 2005
    > > > State : clean, degraded
    > > > Active Devices : 1
    > > >Working Devices : 1
    > > > Failed Devices : 0
    > > > Spare Devices : 0
    > > >
    > > > UUID : b829bc95:3f42a40e:5a8be8f6:4fadb25c
    > > > Events : 0.1345011
    > > >
    > > > Number Major Minor RaidDevice State
    > > > 0 0 0 - removed
    > > > 1 34 1 1 active sync /dev/hdg1
    > > >
    > > >I don't really know what I'm looking at, but the output looks bad,
    > > >right?
    > > >
    > > >I also found this in dmesg's output:
    > > >
    > > >
    > > >md: Autodetecting RAID arrays.
    > > >md: autorun ...
    > > >md: considering hdg1 ...
    > > >md: adding hdg1 ...
    > > >md: adding hde1 ...
    > > >md: created md0
    > > >md: bind<hde1>
    > > >md: bind<hdg1>
    > > >md: running: <hdg1><hde1>
    > > >md: kicking non-fresh hde1 from array!
    > > >md: unbind<hde1>
    > > >md: export_rdev(hde1)
    > > >raid1: raid set md0 active with 1 out of 2 mirrors
    > > >md: ... autorun DONE.
    > > >
    > > >Which also looks bad, don't you think?
    > > >
    > > >So, can anyone please tell me in the short term:
    > > >
    > > >1) Is hde indeed out of the array as it appears?
    > >
    > > Yes.
    > >
    > > >2) How can I determine what the failure is? (is it "a few" bad
    > > sectors,
    > > >too many to want to reuse the drive, or a more complete failure)
    > >
    > > There is no such thing as a 'partial drive failure' on an IDE
    > drive.
    > > Bad sector marking/remapping is handled via the on board electrics
    > -
    > > if the alternate sector map is full, the drive is a short time away
    > > from complete failure. Since you describe odd noises, you don't
    > even
    > > need to worry about that - it's junk.
    > >
    > > >3) Can I reformat, move bad sectors, clean up the drive (if it's a
    > > >minor failure) and get it back into service, and if so how?
    > >
    > > See #2 above.
    > >
    > > >4) If I elect/have to replace the drive, what do I do to make it
    > > take
    > > >up it's ordained place in the md array?
    > >
    > > Power down the system, replace the drive, power up the system. The
    > > only real recovery headache with a RAID is if the boot drive is the
    > > one that failed... In that case, you need to have made certain
    > that
    > > ALL the disks are bootable (lilo can do that, I don't know about
    > > grub), or else have an alternate boot method.
    > >
    > > >Then in the longer term, where should I be looking for the docs so
    > I
    > > >can know this for myself in future?
    > >
    > > All of the docs on the linux software raid system that I've seen
    > are
    > > lousy... The code is still evolving, and it seems to be being
    > > written
    > > by people who aren't into docs. O'Reily has 'Managing RAID on
    > linux'
    > > which isn't too bad but IS inaccurate in places. The way I did it
    > was
    > > to put together a junk system and try things, meanwhile reading
    > > everything google found on 'linux raid'. A real pain, but it's
    > your
    > > data...
    > >
    > > Mike-
    > >
    > > --
    > > Mornings: Evolution in action. Only the grumpy will survive.
    > > --
    > >
    > > Please note - Due to the intense volume of spam, we have installed
    > > site-wide spam
    > > filters at catherders.com. If email from you bounces, try
    > non-HTML,
    > > non-encoded,
    > > non-attachments.
    > >
    > >
    > > --
    > > Check the headers for your unsubscription address
    > > For additional commands send e-mail to suse-linux-e-help@suse.com
    > > Also check the archives at http://lists.suse.com
    > > Please read the FAQs: suse-linux-e-faq@suse.com
    > >
    > >
    > >
    >
    >
    > "You can tell whether a man is clever by his answers. You can tell
    > whether a man is wise by his questions." — Naguib Mahfouz
    >
    >
    >
    >
    > __________________________________
    > Yahoo! Mail - PC Magazine Editors' Choice 2005
    > http://mail.yahoo.com
    >
    > --
    > Check the headers for your unsubscription address
    > For additional commands send e-mail to suse-linux-e-help@suse.com
    > Also check the archives at http://lists.suse.com
    > Please read the FAQs: suse-linux-e-faq@suse.com
    >
    >
    >

    "You can tell whether a man is clever by his answers. You can tell whether a man is wise by his questions." — Naguib Mahfouz

                    
    __________________________________
    Yahoo! Music Unlimited
    Access over 1 million songs. Try it free.
    http://music.yahoo.com/unlimited/

    -- 
    Check the headers for your unsubscription address
    For additional commands send e-mail to suse-linux-e-help@suse.com
    Also check the archives at http://lists.suse.com
    Please read the FAQs: suse-linux-e-faq@suse.com
    

  • Next message: Basil Chupin: "[SLE] TVTIME 1.0.1"
  • Quantcast