Re: Hyperthreaded processor utilization?



On 2007-10-30, Michael Heiming <michael+USENET@xxxxxxxxxxxxxx> wrote:
In comp.os.linux.misc Jean-David Beyer <jdbeyer@xxxxxxxxxxxxxxxxxxxx>:
I have a system with two hyperthreaded Xeon processors running
Red Hat Enterprise Linux 5 with 8GBytes RAM.

It runs four BOINC applications in the background (nice 19) that
take almost 100% of each processor when nothing else is running.

Now if I run a postgreSQL client that does heavy IO, what happens
is more like this:

top - 10:06:01 up 4 days, 2:24, 3 users, load average: 5.15, 5.32, 5.26
Tasks: 170 total, 5 running, 164 sleeping, 1 stopped, 0 zombie
Cpu0 : 51.2%us, 7.0%sy, 2.4%ni, 39.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 51.4%us, 6.4%sy, 0.8%ni, 41.2%id, 0.0%wa, 0.2%hi, 0.0%si, 0.0%st
Cpu2 : 10.2%us, 1.0%sy, 83.3%ni, 5.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 10.4%us, 1.0%sy, 82.5%ni, 6.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8185340k total, 7915132k used, 270208k free, 156460k buffers
Swap: 4096496k total, 412k used, 4096084k free, 6990392k cached

In other words, the BOINC clients take up most of two processors, but
the other processors are idle a lot.

Is this because the Linux kernel is missing something and not

Hard to tell remotely, perhaps you could gain something with
tuning the I/O elevator? Though your problem doesn't seem to be
I/O related, at least from top. Or even try out 2.6.23 which has
afaik a complete new CFS scheduler which might improve things for
you?

The problem has something to do with IO (I think) because it does not
exhibit itself except in two cases. The first is when running heavy IO
with postgreSQL. The other is when I try to look at a newsgroup supported
by my ISP that they actually offload to someone else. In both cases, the
process in question hogs one CPU (which is tolerable, since I have 4), but
when CPU0 is hogged, CPU1 is idle (usually); when CPU2 is hogged, CPU3 is
idle.

But when all 4 CPUs are being hogged by BOINC processes (nice level 19 --
lowest priority), they all run at about 99%. So it either has something
to do with the IO, or something to do with hogging of a CPU with a nice 0
process but not with a nice 19 process.

The latest RHEL5 kernel is 2.6.18-8.1.15.el5PAE and I do not propose to
change it until Red Hat do. I do not wish to deal with product support
problems.

Personally I haven't had problems like this. The following is
from a ht box and it looks like load would be distributed quite
well over CPUs:

Yes, yours seem to distribute nicely.

top - 16:09:36 up 78 days, 1:51, 30 users, load average: 7.00, 4.92, 3.92
Tasks: 451 total, 23 running, 427 sleeping, 0 stopped, 1 zombie
Cpu0 : 2.6% us, 9.7% sy, 66.8% ni, 20.9% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu1 : 0.3% us, 4.9% sy, 74.2% ni, 20.6% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu2 : 1.3% us, 6.9% sy, 65.4% ni, 25.6% id, 0.7% wa, 0.0% hi, 0.0% si
Cpu3 : 1.0% us, 9.4% sy, 60.5% ni, 11.6% id, 17.4% wa, 0.0% hi, 0.0% si
Cpu4 : 0.3% us, 1.6% sy, 80.5% ni, 15.0% id, 2.6% wa, 0.0% hi, 0.0% si
Cpu5 : 0.3% us, 5.1% sy, 73.7% ni, 20.2% id, 0.7% wa, 0.0% hi, 0.0% si
Cpu6 : 0.3% us, 3.9% sy, 71.9% ni, 23.9% id, 0.0% wa, 0.0% hi, 0.0% si
Cpu7 : 0.0% us, 4.0% sy, 72.3% ni, 17.3% id, 6.3% wa, 0.0% hi, 0.0% si

[..]

Right now, with the database not running, I get:

top - 22:43:35 up 7 days, 15:02, 4 users, load average: 4.00, 4.03, 4.08
Tasks: 167 total, 6 running, 161 sleeping, 0 stopped, 0 zombie
Cpu0 : 0.8%us, 0.2%sy, 98.1%ni, 0.8%id, 0.0%wa, 0.1%hi, 0.0%si, 0.0%st
Cpu1 : 0.8%us, 0.3%sy, 98.1%ni, 0.8%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 2.6%us, 0.6%sy, 93.4%ni, 3.3%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 3.5%us, 0.6%sy, 93.3%ni, 2.5%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8185340k total, 6659048k used, 1526292k free, 162112k buffers
Swap: 4096496k total, 274988k used, 3821508k free, 5863552k cached

which is not too bad; the system is essentially idle except for the BOINC
clients. But if I start a database bulk load, I get more like this:

top - 22:46:22 up 7 days, 15:05, 5 users, load average: 4.93, 4.29, 4.16
Tasks: 173 total, 7 running, 166 sleeping, 0 stopped, 0 zombie
Cpu0 : 8.0%us, 3.4%sy, 85.6%ni, 3.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 7.6%us, 2.7%sy, 85.5%ni, 4.2%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu2 : 51.3%us, 5.3%sy, 24.0%ni, 19.4%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu3 : 28.7%us, 3.8%sy, 51.3%ni, 15.8%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 8185340k total, 6672524k used, 1512816k free, 162836k buffers
Swap: 4096496k total, 274988k used, 3821508k free, 5873472k cached

which distributes the work around sort-of evenly, but there is a lot of
CPU time wasted in idle state that should, IMAO, be used by the BOINC
client applications.

One possibility comes to mind: the DBMS stuff uses a lot (25%) of real
memory. While it is not absolutely to have a producer and consumer on the
same chip (e.g., CPU0 and CPU1, or CPU2 and CPU3), it would, at least in
theory, give a higher L3 Cache Hit Ratio than if the processes were on
different chips. No once those processes are sharing about 2 GBytes of
RAM, is that keeping the BOINC processes out?

--
.~. Jean-David Beyer Registered Linux User 85642.
/V\ PGP-Key: 9A2FC99A Registered Machine 241939.
/( )\ Shrewsbury, New Jersey http://counter.li.org
^^-^^ 22:30:01 up 7 days, 14:48, 3 users, load average: 4.11, 4.06, 4.19
.



Relevant Pages

  • Re[2]: routing for 1000 users and 10Mbit internet.
    ... FG> I set up a firewall for more than 300 users, a DMZ with a public webserver, ... FG> webmail and MX on a PII-350MHz with 128 MB RAM. ... why then my users eats so much CPU? ... I have only 61% idle? ...
    (freebsd-questions)
  • Re: Would a 2nd processor really be a waste of time???? help
    ... > buy another processor and more ram, however this is what one supplier ... > Your server has only 1 processor. ... installation including changing the bios settings and installing the smp ... The second cpu will probably help ...somewhat. ...
    (comp.unix.sco.misc)
  • [GIT PULL] Scheduler updates for v2.6.36
    ... adjust when cpu_active and cpuset configurations are updated during cpu on/offlining ... Change nohz idle load balancing logic to push model ... static inline int cpuset_init ... * In the semi idle case, use the nearest busy cpu for migrating timers ...
    (Linux-Kernel)
  • Re: TECH: Williams Stargate Problems
    ... I'm 99.999999% sure that it's the CPU ... boards and not the Input Widget or ROM board since I tested all the CPU ... the RAM errors, I would like to switch all the ram to 4164 (hope that's the ... As a last resort on Ram errors, ...
    (rec.games.video.arcade.collecting)
  • Re: IIGS Acceleration Idea
    ... from the ram refresh to the slot timing to the mainboard and slot RAM addressing. ... To speed up the IIgs you'd have to implement a new CYA chip to produce all the timing and address signals for the main board plus the faster clock for the IIgs. ... Replace the CPU with a high speed CPU and simple 20x clock circuit. ... you will have to redesign Apple //e or Apple IIgs motherboard. ...
    (comp.sys.apple2)