Re: 2.4.23+atalib2 sil3112A write errors

From: Bryan Andersen (bryan_at_bogonomicon.net)
Date: 01/09/04

  • Next message: Ramon Rey Vicente: "[2.6.1-rc2-mm1][BUG] bad: scheduling while atomic!"
    Date:	Thu, 08 Jan 2004 17:45:24 -0600
    To: Bryan Andersen <bryan@nerdvest.com>
    
    

    Sorry for the false alarm. I'm now suspecting hardware errors. I took
    a look at the actual errors introduced into the files. When I octal
    dump both the source file and destination file and diff them I'm only
    seeing a short string of bytes changed. This is telling me I'm getting
    multibit errors slipping though. Time to start swapping cables and
    hardware. Is there a way to get the driver to output messages when
    errors are seen rather than messages for all transfers?

    This is an example of the differences seen.

    72922,72923c72922,72923
    < 4357500 064047 120335 134421 006545 137622 023477 043620 164561
    < 4357520 106514 023757 026270 043466 051034 117400 174451 127107

    ---
     > 4357500 064047 120335 134421 006545 136566 035207 015425 133770
     > 4357520 012031 012055 171154 045114 015406 000160 017147 035214
    - Bryan
    Bryan Andersen wrote:
    > I'm seeing silent write errors with two Seagate 160GB drives on a 
    > sil3112A SATA controller on an ASUS A7N8X-Deluxe motherboard, but I'm 
    > not seeing any read errors.  Each drive is on it's own cable.  Kernel is 
    > 2.4.23 release with the 2.4.23-libata2 patch and some patches for using 
    > MythTV applied.  (I also see the same problem under 2.4.24+libata only) 
    >  I'm also not seeing any error messages in any log files or the kernel 
    > dmesg output.  The error rate looks to be around 1 in a million blocks. 
    >  My current guess is the block or blocks just didn't get written by the 
    > drive as when the system reads in the blocks with the bad data I'm not 
    > seeing any read error messages.
    > 
    > I am now running the tests again using jfs as the filesystem rather than 
    > ext3 to rule out the file system as the cause.  As part of this test I'm 
    > also zeroing the disks before the test and then checking for non zero 
    > data and how large the non zero data blocks are.  Given enough time I 
    > may run this test a couple of times to see if the positions of the bad 
    > data move about or stay put.
    > 
    > How the first testes were run.
    > 
    > I used "cp -a * /data10" as root to copy the data (150GB worth) to the 
    > disk under test.  Then I created md5sum lists for all files in both the 
    > source and destination and sorted and compaired the lists.  Differences 
    > were found between the lists, some files and directories were missing, 
    > or corrupted.  On fscking the disks some inodes were found corrupted.  I 
    > then copied only the corrupted files and after a few iterations I 
    > finally got a copy that was the same as the source.  I then ran a 
    > program to repeatadly md5sum checksum all the files and check against 
    > the source.  It did not see any differences in 10 cycles.
    > 
    > I've attached my .config file, but don't have dmesg output to include. I 
    > will grab that when I reboot after the current tests are finished.
    > 
    > - Bryan
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at  http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at  http://www.tux.org/lkml/
    

  • Next message: Ramon Rey Vicente: "[2.6.1-rc2-mm1][BUG] bad: scheduling while atomic!"

    Relevant Pages

    • Re: SMART disk errors
      ... >> and see if that lists anything in the grown sectors list (it's OK to ... > Data from Defect Lists ... but I don't use 100s of disks ... of my drives which failed progressively accumulated more, ...
      (comp.os.linux.hardware)
    • Re: hard disks for ultra1
      ... > lists as the biggest hard disks that can be used with ultra1 some ... > 4.2GB drives. ... disk drive technology has advanced and current disks are much larger. ...
      (comp.sys.sun.hardware)
    • Re: External drive not showing
      ... DiskMgmt.msc has two areas - At top it lists ... all drives assigned a drive letter & attached. ... in two groups - fix & removeable disks (0,1,2,... ... CD-ROM drives (0,1,. ...
      (microsoft.public.windowsxp.hardware)
    • Fwd: Wiping Out Data
      ... and portable drives. ... which lists just some of the scenarios where shred ... If you truly care about setting the record straight you'd retract your ... I rather suspect you're not grownup enough to do either though. ...
      (Ubuntu)
    • Re: Fedora May Be Killing Your Laptops Hard Drive?
      ... been discussed in fedora-devel list in detail and Fedora weekly news ... command over the last few minutes: ... above URI), the other day, seemed to suggest that the drives were being ... I read messages from the public lists. ...
      (Fedora)