Random Filesystem Lag / Delay
From: James Sella (alles_at_digital-genesis.com)
Date: Mon, 06 Oct 2003 02:25:28 GMT
Looking for some ideas on how to troubleshoot and/or identify the cause
of the issues I've been experiencing.
The system will randomly (every 20 seconds to 15 minutes) pause or delay
when running commands, such as a simple 'ls' or even 'uptime'. The
command will generally delay for 1 to 15 seconds. Every shell
experiences the lag at the same time. When I notice the first command
lag, I can trigger a different command in another window and it will lag
as well. Both will complete at the exact same time, as if they were both
waiting for something. The servers 1min load average will jump up by 1-2
(ie: 0.21 -> 2.10), when the lag occurs.
Network I/O isn't affected. Sessions don't lockup, you can continue to
type, just not run commands and the server will continue to respond to ICMP.
The server is:
Dual Athlon MP 2400+
(2) 80G IDE Drives (Soft RAID1)
Drives are installed as hda and hdc, both are UDMA(100).
Partitions are all ext3 in the default mode (ordered).
So far, I've lowered the elevator's read and write latency from 2048/8096:
# /sbin/elvtune /dev/hda
/dev/hda elevator ID 1
/dev/hdc elevator ID 2
The server generally has up to 20 users logged in at the same time, and
around 200 processes running on it. The RAID1 set is fully functional as
reported in /proc/mdstat, I can see no task that peg the system and
causes the sudden lag when watching in top (even at 0.1 sec updates). I
don't see any noticable change in I/O activity when watching vmstat.
Output from 'time' is strange. No time is spent in user or sys, but real
# time w
I'm not recieving any IDE errors from the kernel, so it doesn't appear
to be the drive itself hanging.