RE: bad blocks... random death

From: Kenneth Goodwin (kgoodwin_at_datamarktech.com)
Date: 08/13/04

  • Next message: Kees.Jager_at_greif.com: "I am on a holiday but will respond tou your message as soon as I return"
    To: "'General Red Hat Linux discussion list'" <redhat-list@redhat.com>
    Date: Fri, 13 Aug 2004 12:44:38 -0400
    
    

    Two cents worth being oblivious to previous discussions in
    this thread.
    see below in-line.

    > -----Original Message-----
    > From: redhat-list-bounces@redhat.com
    > [mailto:redhat-list-bounces@redhat.com]On Behalf Of
    Thierry ITTY
    > Sent: Friday, August 13, 2004 11:33 AM
    > To: redhat-list@redhat.com
    > Subject: bad blocks... random death
    >
    >
    > this continues discussions about bad disk blocks not
    really
    > bad and redhat
    > 9 dying randomly
    >
    > we're now a few on this list experiencing various
    symptoms
    > (dma errors, bad
    > blocks on disks, system freeze or death) that look like
    > hardware problems.
    > after talking together we can now say that those problems
    > are pure OS
    > problems.

    If all are SMP systems, then perhaps there is a Spinlock
    conflict
    (multi-cpu contention) problem with the disk driver.
    But I doubt that the disk drivers in the kernel have changed
    in years.
    I am running RH9 on several heavily used scsi based Compaq
    multi-cpu machines with no problems.
    So based on my experience, I dount believe in a softwrae
    issue here.

    >
    > the disks with bad blocks work actually fine elswhere (in
    my
    > case I ran the
    > manufacturer low-level diags and no disk had any problem.
    > and, ain't it
    > very strange that 10 disks get the same problems at the
    same
    > time ?!!!)

    Not if you have an EMI (electro-magnetic interference)
    shielding issue. The drives are fine.
    They might be cross
    polluting each other ,the cables and/or the controllers with
    EMI.
    that will corrupt the bit sream between the drives and the
    controller and give you errors.

    The heavier you use the drives, the more the
    magnetic coils that move the heads are used. Those coils
    put out an EMI field.
    The more your use the drives, the more consistent that EMI
    field is and without good grounding
    it "leak" into whatever copper ground path is available
    including your drive cables,
    power cables, etc.
    normally Emi is drained off through the drive's grounds to
    the chassis. It's
    grounded to the chassis and through the chassis to the
    ground line on the power supply to earth.

    check the following if you haven't already as it applies to
    your system:

    1) get an electrical outlet tester at your local Home
    Depot/Loews et.al

    2) Check the outlets your systems are plugged into. (if you
    use non nema 5-15R/5-20R outlets (household type)
    then get a tester or electrical testing service in to check
    your grounds.)

    3) Make sure you have a good reliable earth ground at the
    outlet. If you dont, get it fixed.
    You would be surprised at how many outlets dont have valid
    earth grounds.
    If you are in a commercial building, your data center
    outlets should have been installed with
    ISOLATED Grounds , that is a separate ground wire between
    the power panel and the receptable.
    Most commercial electrical uses the metal jacket as a ground
    path and that tends to come apart over time
    (ie NO MORE GROUND)

    4) Check the power supply - make sure you are not
    overloading it past it's rated maximum output. Make sure
    that it is grounded to the chassis and to the earth ground.
    Normally it grounds the chassis through it's case
    but some have separate ground connections, look for ground
    screw connections.

    5) If your drives have ground screws or Tabs on them,
    connect them to a reliable chassis ground point.
    dont assume they have a good ground through the drive
    mounting screws.

    6) Use round shielded cables and watch the grounds on them.
    If they are single ended grounds on the shields
    make sure that the connected end is connected to a valid
    ground source.

    7) Grounds are normally single end connected to prevent
    ground fault loops, that is, you dont want more than one
    ground path here if you can help it. Multiple ground paths
    wont help and can hurt under the wrong circumstances. Drives
    with ground tabs dont generally ground through the mounting
    screws, but check the drive specs. A cable with the shield
    connected at both ends is also expecting to ground the
    drive, the cable
    should be connecting to a ground pin on the drives
    interface.

    8) If you have these drives "dense packed" in your chassis,
    you might want to consider putting
    grounded shields between them if all else fails, grounded
    copper plates for example.

    9) Make sure that you route the power cables away from the
    drive controller cables within the chassis.

    10) look for ways that EMI could be crossing.

    11) You might just have one really EMI noisy drive. There
    are EMI meters that can be used
    to measure EMI levels.

    12) You can also be subject to a different wavelength of
    radiation knows as RFI , or Radio Frequency
    Interference.

    >
    > the problem happens on various machines (gigabyte, asus,
    > athlon, pentium,
    > maxtor, western...).
    >
    > it seems it is related to high load periods (in my case a
    > heavily used file
    > server).
    >
    > we've been advised to change dma disks settings. I tried
    > various things (no
    > dma at all, forcing mdma0 or udma2). the system behave
    > differently (either
    > no errors or other errors as dma timeouts), but it's not
    > working quite well
    > (for example deactivating dma on disks lowers the average
    network
    > throughput from 50 MB/s to 1.5 !!! almost 40 times slower
    !!!
    >
    > we really need help to investigate this problem which
    causes
    > io errors and
    > fs corruption !
    >
    > tia
    >
    >
    > --
    > redhat-list mailing list
    > unsubscribe
    mailto:redhat-list-request@redhat.com?subject=unsubscribe
    > https://www.redhat.com/mailman/listinfo/redhat-list
    >

    -- 
    redhat-list mailing list
    unsubscribe mailto:redhat-list-request@redhat.com?subject=unsubscribe
    https://www.redhat.com/mailman/listinfo/redhat-list
    

  • Next message: Kees.Jager_at_greif.com: "I am on a holiday but will respond tou your message as soon as I return"

    Relevant Pages

    • Re: problem with SATA disk, difference between standard kernel and Debian kernel
      ... well it seems atleast the drives are okay, ... seems strange that the debian patches would make a difference (but I ... because I'm using different cables now. ... happens when the computer is idle; I can put some load on the disks ...
      (Debian-User)
    • Re: problem with SATA disk, difference between standard kernel and Debian kernel
      ... well it seems atleast the drives are okay, ... seems strange that the debian patches would make a difference (but I ... because I'm using different cables now. ... happens when the computer is idle; I can put some load on the disks ...
      (Debian-User)
    • Re: What is best hard drive for Linux?
      ... > Quantum drives in the server farms I've worked on and also ... > that they are showing their ignorance of how their disks are used, ... wd disks are picky about tis cables and jumper settings ... if you can get a replacement ofr a failed/bad disks ...
      (Debian-User)
    • Re: Hows RAID doing?
      ... For RAID 5 you need at least three disks. ... Having two CD/DVD drives on the same cable is usually not a problem, ... thinner SATA cables make for better airflow inside the box, ...
      (comp.os.linux.hardware)
    • Some Test with DVD-RAM
      ... I have done some quality assessment of two DVD-RAM capable ... DVD-burners with four different DVD-RAM brands. ... both drives ... These disks are incredibly bad. ...
      (comp.sys.ibm.pc.hardware.storage)