Temporary 'lock-up' under heavy write, MegaRAID RAID-5

From: Dave Ewart (davee_at_ceu.ox.ac.uk)
Date: 11/09/05

  • Next message: Philippe Dhont (Sea-ro): "Debian ssmtp on AIX ?"
    Date: Wed, 9 Nov 2005 10:46:24 +0000
    To: debian-amd64@lists.debian.org, debian-user@lists.debian.org
    
    
    

    System: 4-way Opteron, generic Debian Sarge AMD64
    RAID controller: LSI Logic MegaRAID 320-1, 64MB cache
    RAID config: Three 146GB 15K SCSI/320 disks, RAID-5
    Kernel: 2.6.14 SMP, includes megaraid driver

    The above system is incredibly fast under almost all conditions, except
    when writing very large files (say, 100s of MB, or even GB). When
    writing such files, the system effectively locks-up for many seconds -
    typically, for as long as it takes to finish writing/flushing the file
    to disk. This lock-up affects all other processes: local text editor
    sessions, workstations with /home NFS-mounted, web server stops serving.
    (I guess all the affected processes are those which are contending for
    disk write access, actually). In particular, the workstations which
    have /home NFS-mounted experience a *workstation* hang (if trying to
    write) during the *server* disk flush, which is very frustrating. Given
    that a 'write' may simply involve updating a web browser history stored
    in /home, this is an extremely serious problem.

    Example while system is idle, out of work hours: while creating a 1GB
    file (copying an existing file, already cached in RAM), 'vmstat 3' shows
    the following:

    procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu----
     r b swpd free buff cache si so bi bo in cs us sy id wa
     0 0 0 4280268 55888 3366484 0 0 0 0 260 49 0 0 100 0
     1 1 0 3777432 56364 3847652 0 0 0 6419 319 55 0 23 77 0
     1 4 0 3258408 56852 4342476 0 0 0 10243 407 45 0 25 75 0
     0 3 0 3070296 57028 4520868 0 0 0 8856 403 99 0 9 75 16
     0 4 0 3068152 57044 4520852 0 0 5 9561 417 153 0 1 72 27
     0 3 0 3069316 57044 4520852 0 0 0 10240 429 144 0 0 75 25
     0 3 0 3069356 57044 4520852 0 0 0 10219 411 85 0 0 75 25
     0 3 0 3069368 57044 4520852 0 0 0 8876 391 78 0 0 75 25
    [...]
     0 2 0 3077856 57044 4520852 0 0 0 9557 409 44 0 0 75 25
     0 2 0 3077856 57044 4520852 0 0 0 8875 384 41 0 0 75 25
     0 1 0 3097748 57044 4520852 0 0 0 7704 421 42 0 2 73 25
     0 0 0 3100096 57048 4520848 0 0 0 56 259 20 0 0 99 1
     0 0 0 3100112 57052 4520844 0 0 0 552 362 32 0 0 97 3
     0 0 0 3100112 57052 4520844 0 0 0 0 270 63 0 0 100 0
     0 0 0 3100384 57052 4520844 0 0 0 5 260 39 0 0 100 0

    I see that the 'bo' column, "blocks written to block device" kicks in
    and it takes approximately two minutes to finish flushing this file to
    disk (which makes a disk write rate of less than 10MB/sec, which strikes
    me as very slow). I also see that the CPU IO-Wait column ('wa') shows
    25% while this is happening: this corresponds to one of our four CPUs,
    meaning that CPU is waiting for the file to flush to disk, presumably.
    Once the flush finishes, the disk and CPU state returns to idle.

    I have already tried:

    - a couple of different kernels. The stock Sarge kernel
      2.6.8-11-amd64-k8-smp, and a custom-compiled 2.6.14 kernel. I
      configured the custom kernel to use the pre-emptible features designed
      for desktop use, in the hope that the other interactive processes
      would benefit from this. The kernel doesn't seem to affect the
      behaviour I describe above.

    Should I expect this kind of performance when writing large files?

    If not, then what can be done to improve this kind of write performance?

    The RAID controller is currently set to "write-through". I understand
    that, in theory, better write performance may be obtained by using
    "write-back", although I don't see how that would help for files that
    are many times larger than the RAID controller cache (64MB vs. files of
    100s of MB). I understand the potential data-loss implications of using
    write-back. Thoughts/comments on changing to "write-back" in these
    circumstances?

    Any other suggestions or reports of similar experiences?

    Cheers,

    Dave.

    -- 
    Dave Ewart
    davee@ceu.ox.ac.uk
    Computing Manager, Cancer Epidemiology Unit
    Cancer Research UK / Oxford University
    PGP: CC70 1883 BD92 E665 B840 118B 6E94 2CFD 694D E370
    Get key from http://www.ceu.ox.ac.uk/~davee/davee-ceu-ox-ac-uk.asc
    N 51.7518, W 1.2016
    
    

    -- 
    To UNSUBSCRIBE, email to debian-user-REQUEST@lists.debian.org 
    with a subject of "unsubscribe". Trouble? Contact listmaster@lists.debian.org
    


  • Next message: Philippe Dhont (Sea-ro): "Debian ssmtp on AIX ?"

    Relevant Pages

    • need fastest way to write 2gig array to disk file
      ... writes seem to go directly to the cache and so a 2 gig output ... (when writing to the cache, ... Top says almost all the memory is free, ... And why would a write from memory to a scsi disk go at only ...
      (comp.os.linux.development.apps)
    • Re: what to do with "too much" RAM?
      ... > and such but the general rule is to NEVER cache writes. ... That depends what the raid controller and disk controllers ... > Think about server crash, power loss, file system integrity and do the math. ...
      (comp.unix.solaris)
    • Re: need fastest way to write 2gig array to disk file
      ... >what is the fastest way to write this to a disk file. ... >writes seem to go directly to the cache and so a 2 gig output ... (when writing to the cache, ... >any help would be most ap-t perform device read timings ...
      (comp.os.linux.development.apps)
    • Re: need fastest way ... found a problem on the disk
      ... Well, it seems I've got an anomoly with either my disk, or the file ... cache in 10 seconds, no delays. ... Sorry for the trouble, this is not an issue of how the program ... (when writing to the cache, ...
      (comp.os.linux.development.apps)
    • Re: Caching control
      ... |> | invalidate/unmap them in order to discard the data from memory. ... |> writing out to disk. ... | easy to discard as clean disk cache. ... stating that a specific amount of RAM can be used only for I/O ...
      (comp.os.linux.development.system)