Re: ZFS with Linux: An Open Plea



On Wed, Apr 18, 2007 at 01:25:19PM -0400, Lennart Sorensen wrote:

Does it matter that google's recent report on disk failures indicated
that SMART never predicted anything useful as far as they could tell?
Certainly none of my drive failures ever had SMART make any kind of
indication that anything was wrong.

I saw that talk, and that's not what I got out of it. They found that
SMART error reports _did_ correlate with drive failure. See page 8
of:

http://www.usenix.org/events/fast07/tech/full_papers/pinheiro/pinheiro.pdf

(If you're not a USENIX member, you may be able to find a free
download copy elsewhere.)

However, they found that the correlation was not strong enough to make
it economically feasible to replace disks reporting SMART failures,
since something like 70% of disks were still working a year after the
first failure report. Also, they found that some disks failed without
any SMART error reports.

Now, Google keeps multiple copies (3 in GoogleFS, last I heard) of
data, so for them, "economically feasible" means something different
than for my personal laptop hard drive. I have twice had my laptop
hard drive start spitting SMART errors and then die within a week. It
is economically quite sensible for me to replace my laptop drive once
it has an error, since I don't carry around 3 laptops everywhere I go.

-VAL
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: ZFS with Linux: An Open Plea
    ... Does it matter that google's recent report on disk failures indicated ... Certainly none of my drive failures ever had SMART make any kind of ... since something like 70% of disks were still working a year after the ...
    (Linux-Kernel)
  • Re: PROBLEM: I/O scheduler problem with an 8 SATA disks raid 5 under heavy load ?
    ... Here is what I found in my dmesg: "anticipatory: forced dispatching is broken, please report this". ... except for some unexplained failures from the sata_nv attached disks. ...
    (Linux-Kernel)
  • Re: [SLE] frequent hard disk failures - 9.3?
    ... Carlos F Lange wrote: ... > disks either crashing or showing bad sectors. ... SuSE and ReiserFS were the only things in common ... so guess under what conditions failures occur. ...
    (SuSE)
  • Re: Too much hard drives failing
    ... ATA error count increased from 11605 to 11610 ... But the disks are really broken. ... What kind of failures are these? ... Can this anyhow be related with reiserfs? ...
    (Fedora)
  • Weird ATA error
    ... cache: enabled, read cache: enabled, doesn't support DPO or FUA ... are two other HD501LJ disks in the system, one with firmware CR100-10, ... a reboot fixed the symptoms - now all drives report all ...
    (comp.os.linux.hardware)