Unknown kernel mode CPU thief in SMP system.

From: Gil Hamilton (gil_hamilton_at_hotmail.com)
Date: 03/26/04


Date: 26 Mar 2004 09:35:48 -0800

I'm trying to diagnose a strange problem and would greatly appreciate
some insight into how to figure out what's going on...

I'm working on a dual processor system (two 1GHz genuine Intel Pentium
IIIs according to /proc/cpuinfo) and running a stock Redhat Enterprise
3 SMP kernel (2.4.21-4.ELsmp). I have lots of free memory.

I was doing some investigation on various ways to signal between
multiple threads running on different CPUs without using system calls
for a performance application (doing bit_test_and_set versus just
writing entire words and the like). I have a very simple and very
small test app that creates an additional thread with pthread_create,
and then both the original thread and the new one simply continuously
slam values into (different) shared variables (they aren't actually
synchronizing with each other).

In order to maximize the concurrency, I put both threads into the
"real time" scheduling class (SCHED_FIFO) with sched_setscheduler(2)
and raised their priority to the maximum value. I then used
sched_setaffinity to assign one to CPU 0 and the other to CPU 1.

When I first started working on this yesterday, I was seeing what I
expected: when I run "time ./mytest", the "real" value is about half
of the "user" value and the "sys" value is near zero. That is, I have
two CPUs running different threads of a very simple CPU-bound program
and nothing else of much import is happening on the system. I have
changed some details of the test since then but as best I can tell,
I've changed nothing that would make any difference to this situation.

So today, on the other hand, I instead see "real" and "user" being
close to the same. There's very little else going on on the system so
I don't understand why my two threads wouldn't be getting virtually
all of the available CPU time. I then ran top(1) to try to get an
idea of what was happening (after using chrt to put my shell and stop
into SCHED_FIFO as well). It shows that one or the other CPU (usually
CPU 1) is spending a considerable amount of time in "system" mode. To
me, this indicates that there is a considerable amount of context
switching, paging or interrupt activity going on, but I can't figure
out what's happening. (I'm pretty sure it's not top itself, because
the behavior of the timed app is pretty much identical with and
without top.)

If I remove the setaffinity call, it works much better than with it
(presumably the scheduler is then able to run each thread on whichever
CPU is free at the moment). But I'm still not seeing nearly the
concurrency that I was seeing yesterday. top shows "irq", "softirq"
and "iowait" times all near 0%.

I have looked at every running process with task_set(1) and only
[migration/1] and [ksoftirqd/1] have their CPU affinity set to CPU 1.
All of the /proc/irq/*/smp_affinity entries show 0xffffffff. And top
shows that - apart from my programs and top itself - only
[migration/0] and [migration/1] have a "realtime" priority. And, BTW,
if I run top when my test isn't running, it shows the usual expected
99.9% idle state for both CPUs.

It looks to me as though there are a boatload of context switches
going on, but I can't figure out why. Could this just be context
switching between my app and the [migration/?] twins? How do I find
out where all that kernel mode CPU time is going?

Thanks a lot for any help you can give.

 - GH



Relevant Pages

  • Re: PII vs PIII
    ... Apparently you don't even understand the priority ... then you think a dual processor system ... multitasking OSes would indicate that 486 CPU cannot run XP. ... Response takes longer (compared to a faster ...
    (comp.os.linux.hardware)
  • Re: Recent Problems with RELENG_7 i386
    ... Such things are kinda hard to notice with a fast CPU and enough RAM:) ... your general impression of context switching is about right. ... Ignoring interrupt context for the moment, at every tick the scheduler ...
    (freebsd-stable)
  • Re: Pentium D and Windows
    ... When I bought this dual processor CPU, my research on Microsoft's site indicated that XP Home would not support dual processors. ... XP Home is licensed for only *one* socket, but the CPU that fits into that socket may have more than one core and/or HT support. ...
    (microsoft.public.windows.mediacenter)
  • Re: Pentium D and Windows
    ... I was thinking of building a new pc and was interested in the duel ... All versions of XP, including XP Home, support both processors. ... When I bought this dual processor CPU, ...
    (microsoft.public.windows.mediacenter)
  • Re: P
    ... >i am a little new to developing applications in Solaris. ... >application to a dual processor one, to obtain maximum CPU ... Organized by the author of pkg-get ...
    (comp.unix.solaris)