Random Filesystem Lag / Delay

From: James Sella (alles_at_digital-genesis.com)
Date: 10/06/03


Date: Mon, 06 Oct 2003 02:25:28 GMT

Looking for some ideas on how to troubleshoot and/or identify the cause
of the issues I've been experiencing.

The system will randomly (every 20 seconds to 15 minutes) pause or delay
when running commands, such as a simple 'ls' or even 'uptime'. The
command will generally delay for 1 to 15 seconds. Every shell
experiences the lag at the same time. When I notice the first command
lag, I can trigger a different command in another window and it will lag
as well. Both will complete at the exact same time, as if they were both
waiting for something. The servers 1min load average will jump up by 1-2
(ie: 0.21 -> 2.10), when the lag occurs.

Network I/O isn't affected. Sessions don't lockup, you can continue to
type, just not run commands and the server will continue to respond to ICMP.

The server is:

   Dual Athlon MP 2400+
   2G RAM
   (2) 80G IDE Drives (Soft RAID1)
      Drives are installed as hda and hdc, both are UDMA(100).
      Partitions are all ext3 in the default mode (ordered).

So far, I've lowered the elevator's read and write latency from 2048/8096:

# /sbin/elvtune /dev/hda
/dev/hda elevator ID 1
         read_latency: 1024
         write_latency: 4096
         max_bomb_segments: 0

#/sbin/elvtune /dev/hdc
/dev/hdc elevator ID 2
         read_latency: 1024
         write_latency: 4096
         max_bomb_segments: 0

The server generally has up to 20 users logged in at the same time, and
around 200 processes running on it. The RAID1 set is fully functional as
reported in /proc/mdstat, I can see no task that peg the system and
causes the sudden lag when watching in top (even at 0.1 sec updates). I
don't see any noticable change in I/O activity when watching vmstat.

Output from 'time' is strange. No time is spent in user or sys, but real
time passes:

# time w
real 0m32.005s
user 0m0.000s
sys 0m0.010s

I'm not recieving any IDE errors from the kernel, so it doesn't appear
to be the drive itself hanging.

Any ideas??

-Jim



Relevant Pages

  • Re: dcdiag - advertising errors on newley promoted domain controller
    ... Can you do the following on uksccmads01 ... The problem may actually have nothing to do with the new server but the ... From a command prompt try and see if you get any additional info ... Skipping site UK-CCM, this site is outside the scope ...
    (microsoft.public.windows.server.active_directory)
  • Re: dcdiag - advertising errors on newley promoted domain controller
    ... Can you do the following on uksccmads01 ... The problem may actually have nothing to do with the new server but the ... Active Directory Forest Replication GUIDs Found: ... From a command prompt try and see if you get any additional info ...
    (microsoft.public.windows.server.active_directory)
  • Re: dcdiag - advertising errors on newley promoted domain controller
    ... Default Server: uksccmads01.jw-uk.jameswalker.co.uk ... Output from dnslint on uksccmads01: No erros found, ... Active Directory Forest Replication GUIDs Found: ... From a command prompt try and see if you get any additional info ...
    (microsoft.public.windows.server.active_directory)
  • Re: dcdiag - advertising errors on newley promoted domain controller
    ... Both forward and reverse zones and server records have the correct IP ... Once you have found the error then on both servers, from a command ... Skipping site UK-CCM, this site is outside the scope provided ...
    (microsoft.public.windows.server.active_directory)
  • Re: dcdiag - advertising errors on newley promoted domain controller
    ... Both forward and reverse zones and server records have the correct IP ... Once you have found the error then on both servers, from a command prompt, ... Skipping site UK-CCM, this site is outside the scope provided by ...
    (microsoft.public.windows.server.active_directory)