Re: Linux does not care for data integrity

From: Helge Hafting (helge.hafting_at_aitel.hist.no)
Date: 06/02/05

  • Next message: Jörn Engel: "Re: Suggestion on "int len" sanity"
    Date:	Thu, 02 Jun 2005 10:53:46 +0200
    To: Bill Davidsen <davidsen@tmr.com>
    
    

    Bill Davidsen wrote:

    > Matthias Andree wrote:
    >
    >> On Sun, 29 May 2005, Greg Stark wrote:
    >>
    >>
    >>> Oracle, Sybase, Postgres, other databases have hard requirements. They
    >>> guarantee that when they acknowledge a transaction commit the data
    >>> has been
    >>> written to non-volatile media and will be recoverable even in the
    >>> face of a
    >>> routine power loss.
    >>>
    >>> They meet this requirement just fine on SCSI drives (where write
    >>> caching
    >>> generally ships disabled) and on any OS where fsync issues a cache
    >>> flush. If
    >>
    >>
    >>
    >> I don't know what facts "generally ships disabled" is based on, all of
    >> the more recent SCSI drives (non SCA type though) I acquired came with
    >> write cache enabled and some also with queue algorithm modifier set
    >> to 1.
    >>
    >>
    >>> Worse, if the disk flushes the data to disk out of order it's quite
    >>> likely the entire database will be corrupted on any simple power
    >>> outage. I'm not clear whether that's the case for any common drives.
    >>
    >>
    >>
    >> It's a matter of enforcing write order. In how far such ordering
    >> constraints are propagated by file systems, VFS layer, down to the
    >> hardware, is the grand question.
    >>
    > The problem is that in many options required to make that happen in
    > the o/s, hardware, and application are going to kill performance. And
    > even if you can control order of write, unless you can get write to
    > final non-volatile media control you can get a sane database but still
    > lose transactions.
    >
    > If there was a way for the o/s to know when a physical write was done
    > other than using flushes to force completion, then overall performance
    > could be higher, but individual transaction might have greater
    > latency. And the app could use fsync to force order of write as
    > needed. In many cases groups of writes can be done in any order as
    > long as they are all done before the next logical step takes place.

    There is a workaround. Get an UPS just for the disks. It don't have to be
    big, just enough to keep the disks going long enough to commit their
    caches after the rest of the machine died from a power loss. Such a small
    unit could possibly fit inside the cabinet, avoiding the trouble with
    people stepping on the power cord.

    With this in place, any write that makes it from the controller to the
    disk is safely stored for all practical purposes.

    Helge Hafting
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Jörn Engel: "Re: Suggestion on "int len" sanity"

    Relevant Pages

    • Re: ZFS melting under postgres...
      ... kB of logical data blocks. ... In other words - you possibly loose data blocks that were not written ... after a power loss. ... tracks at a time (i.e. all SATA/ATA disks and probably more and more SAS/SCSI disks as well these days). ...
      (freebsd-current)
    • Re: AS/400 RAID Array (or how to get a 9337 working?)
      ... If my memory did not fail me, when the control ... status of the disks, perhaps you may want to reformat everything and reraid ... > screens. ... > exhibiting this behavior. ...
      (comp.sys.ibm.as400.misc)
    • Re: [RFC][PATCH] inotify 0.10.0
      ... If one takes as the "unit of measurement" the number of ... code under your control, than to have to align kernel and glibc code in ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: Reducing music collection space requirements
      ... I have bought 3 Sony CD/DVD ... dust can ever get into the unit to live with your disks. ... WMA lossless or other lossless formats like WAV or raw PCM) are more ... I'm particularly reluctant to entrust my music to computer control. ...
      (rec.audio.tubes)
    • Re: Linux 2.4.25-pre6
      ... Adaptec AIC7902 Ultra320 SCSI adapter ... Disks: ... SEAGATE ST373307LW X 1 ... To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/ ...
      (Linux-Kernel)