RE: bad blocks... random death : solution ?

From: Thierry ITTY (thierry.itty_at_besancon.org)
Date: 08/25/04

  • Next message: Brian D. McGrew: "Redhat 9 and IP Masquerading"
    Date: Wed, 25 Aug 2004 19:01:08
    To: General Red Hat Linux discussion list <redhat-list@redhat.com>
    
    

    well after having carefully read the answer of Kenneth I finally decided to
    fight EMI, ESD, RFI and all that kind of troubles

    my disks were mounted with rubber cylinders and thus had NO ground chassis
    connection. I installed a "ground wire" on every disk (a wire connected on
    one side to the disk cabinet and on the other side to the chassis).
    I got less errors.

    the 2 machines were connected to each other with a small gigabit switch,
    and I noticed that this switch had NO grounding too (just a small AC/DC
    converter as power input) then I installed a ground wire on the switch
    going to one of the chassis (I could also have connected this ground wire
    to some AC ground plug...)
    remember that the 2 machines have a 200 Mbps continuous flow for hours to
    copy all the data from one to the other !

    now I have NO MORE ERRORS

    I can say that it's very likely that all the troubles I had, disks errors,
    bad blocks, file system corruption, etc, came from the ungrounded switch
    (and possibly disks). so now I "ground" everything I can !

    many thanks to Kenneth for this clue !
    hth

    A 12:44 13/08/2004 -0400, vous avez écrit :
    >Two cents worth being oblivious to previous discussions in
    >this thread.
    >see below in-line.
    >
    >> -----Original Message-----
    >> From: redhat-list-bounces@redhat.com
    >> [mailto:redhat-list-bounces@redhat.com]On Behalf Of
    >Thierry ITTY
    >> Sent: Friday, August 13, 2004 11:33 AM
    >> To: redhat-list@redhat.com
    >> Subject: bad blocks... random death
    >>
    >>
    >> this continues discussions about bad disk blocks not
    >really
    >> bad and redhat
    >> 9 dying randomly
    >>
    >> we're now a few on this list experiencing various
    >symptoms
    >> (dma errors, bad
    >> blocks on disks, system freeze or death) that look like
    >> hardware problems.
    >> after talking together we can now say that those problems
    >> are pure OS
    >> problems.
    >
    >If all are SMP systems, then perhaps there is a Spinlock
    >conflict
    >(multi-cpu contention) problem with the disk driver.
    >But I doubt that the disk drivers in the kernel have changed
    >in years.
    >I am running RH9 on several heavily used scsi based Compaq
    >multi-cpu machines with no problems.
    >So based on my experience, I dount believe in a softwrae
    >issue here.
    >
    >>
    >> the disks with bad blocks work actually fine elswhere (in
    >my
    >> case I ran the
    >> manufacturer low-level diags and no disk had any problem.
    >> and, ain't it
    >> very strange that 10 disks get the same problems at the
    >same
    >> time ?!!!)
    >
    >Not if you have an EMI (electro-magnetic interference)
    >shielding issue. The drives are fine.
    >They might be cross
    >polluting each other ,the cables and/or the controllers with
    >EMI.
    >that will corrupt the bit sream between the drives and the
    >controller and give you errors.
    >
    >The heavier you use the drives, the more the
    >magnetic coils that move the heads are used. Those coils
    >put out an EMI field.
    >The more your use the drives, the more consistent that EMI
    >field is and without good grounding
    >it "leak" into whatever copper ground path is available
    >including your drive cables,
    >power cables, etc.
    >normally Emi is drained off through the drive's grounds to
    >the chassis. It's
    >grounded to the chassis and through the chassis to the
    >ground line on the power supply to earth.
    >
    >check the following if you haven't already as it applies to
    >your system:
    >
    >1) get an electrical outlet tester at your local Home
    >Depot/Loews et.al
    >
    >2) Check the outlets your systems are plugged into. (if you
    >use non nema 5-15R/5-20R outlets (household type)
    >then get a tester or electrical testing service in to check
    >your grounds.)
    >
    >3) Make sure you have a good reliable earth ground at the
    >outlet. If you dont, get it fixed.
    >You would be surprised at how many outlets dont have valid
    >earth grounds.
    >If you are in a commercial building, your data center
    >outlets should have been installed with
    >ISOLATED Grounds , that is a separate ground wire between
    >the power panel and the receptable.
    >Most commercial electrical uses the metal jacket as a ground
    >path and that tends to come apart over time
    >(ie NO MORE GROUND)
    >
    >4) Check the power supply - make sure you are not
    >overloading it past it's rated maximum output. Make sure
    >that it is grounded to the chassis and to the earth ground.
    >Normally it grounds the chassis through it's case
    >but some have separate ground connections, look for ground
    >screw connections.
    >
    >5) If your drives have ground screws or Tabs on them,
    >connect them to a reliable chassis ground point.
    >dont assume they have a good ground through the drive
    >mounting screws.
    >
    >6) Use round shielded cables and watch the grounds on them.
    >If they are single ended grounds on the shields
    >make sure that the connected end is connected to a valid
    >ground source.
    >
    >7) Grounds are normally single end connected to prevent
    >ground fault loops, that is, you dont want more than one
    >ground path here if you can help it. Multiple ground paths
    >wont help and can hurt under the wrong circumstances. Drives
    >with ground tabs dont generally ground through the mounting
    >screws, but check the drive specs. A cable with the shield
    >connected at both ends is also expecting to ground the
    >drive, the cable
    >should be connecting to a ground pin on the drives
    >interface.
    >
    >8) If you have these drives "dense packed" in your chassis,
    >you might want to consider putting
    >grounded shields between them if all else fails, grounded
    >copper plates for example.
    >
    >9) Make sure that you route the power cables away from the
    >drive controller cables within the chassis.
    >
    >10) look for ways that EMI could be crossing.
    >
    >11) You might just have one really EMI noisy drive. There
    >are EMI meters that can be used
    >to measure EMI levels.
    >
    >12) You can also be subject to a different wavelength of
    >radiation knows as RFI , or Radio Frequency
    >Interference.
    >
    >
    >
    >
    >>
    >> the problem happens on various machines (gigabyte, asus,
    >> athlon, pentium,
    >> maxtor, western...).
    >>
    >> it seems it is related to high load periods (in my case a
    >> heavily used file
    >> server).
    >>
    >> we've been advised to change dma disks settings. I tried
    >> various things (no
    >> dma at all, forcing mdma0 or udma2). the system behave
    >> differently (either
    >> no errors or other errors as dma timeouts), but it's not
    >> working quite well
    >> (for example deactivating dma on disks lowers the average
    >network
    >> throughput from 50 MB/s to 1.5 !!! almost 40 times slower
    >!!!
    >>
    >> we really need help to investigate this problem which
    >causes
    >> io errors and
    >> fs corruption !
    >>
    >> tia
    >>
    >>
    >> --
    >> redhat-list mailing list
    >> unsubscribe
    >mailto:redhat-list-request@redhat.com?subject=unsubscribe
    >> https://www.redhat.com/mailman/listinfo/redhat-list
    >>
    >
    >
    >--
    >redhat-list mailing list
    >unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
    >https://www.redhat.com/mailman/listinfo/redhat-list
    >
    >
                            - * - * - * - * - * - * -
    Bien sûr que je suis perfectionniste !
    Mais ne pourrais-je pas l'être mieux ?
            Thierry ITTY
    eMail : Thierry.Itty@Besancon.org FRANCE

    -- 
    redhat-list mailing list
    unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
    https://www.redhat.com/mailman/listinfo/redhat-list
    

  • Next message: Brian D. McGrew: "Redhat 9 and IP Masquerading"

    Relevant Pages

    • Re: Not all IDE hard disks are detected by my BIOS
      ... disks. ... something that I can do to have the BIOS detecting all my IDE hard ... I am assuming these are PATA not SATA drives. ... But that doesn't say that that the drives are on the connections ...
      (Fedora)
    • Re: Not all IDE hard disks are detected by my BIOS
      ... disks. ... slave when it was actually the master and the BIOS would not pick it up. ... I am assuming these are PATA not SATA drives. ... But that doesn't say that that the drives are on the connections ...
      (Fedora)
    • Some Test with DVD-RAM
      ... I have done some quality assessment of two DVD-RAM capable ... DVD-burners with four different DVD-RAM brands. ... both drives ... These disks are incredibly bad. ...
      (comp.sys.ibm.pc.hardware.storage)
    • Re: Reward offered: Macintosh software for Corvus Omninet
      ... Constellation III set since it was released, ... moved to the Mac and uploaded, emailed, whatever. ... The original disks are generic, they don't say anything on the ... I refurbished a couple 800K drives and a couple 1.44MB drives so that ...
      (comp.sys.apple2)
    • RE: bad blocks... random death
      ... > the disks with bad blocks work actually fine elswhere (in ... The drives are fine. ... including your drive cables, ... grounded to the chassis and through the chassis to the ...
      (RedHat)