Re: CPU temp -- LONG

From: Jean-David Beyer (jdbeyer_at_exit109.com)
Date: 10/29/04


Date: Fri, 29 Oct 2004 16:58:59 -0400

Floyd L. Davidson wrote:
> Jean-David Beyer <jdbeyer@exit109.com> wrote:

[snip]

>>The box it is in has a lot of fans. In the front is an air
>>filter and there are three intake fans, the 25mm thick
>>ones. These are all constant speed fans. The main one is 120mm,
>>the second one is 80mm, and I added a 40mm (only size that would
>>fit) that blows right at the top CPU fan intake. None of these
>>have tachometers that the lm-sensors package can read (though
>>the BIOS can read some of them). lm-sensors reads the CPU fans
>>and temperatures, and "System" temperature.
>
> If the BIOS can read them, so can lm_sensors. Of course the
> trick to that is figuring out how! (Below is first a generic
> discussion, followed by comments specific to your described
> hardware.)
>
> First, look in /sys/bus/{i2c | isa}/drivers, to find essentially
> a list of drivers which have been correctly loaded to supply
> information that lm_sensors can read. (Assuming a 2.6 kernel
> and the sysfs pseudo filesystem is mounted.)

Both assumptions incorrect.

$ uname -r
2.4.21-20.ELsmp

$ locate /sys/bus
$

I cannot, as you can see.

Do you mean in here?
$ ls -l /proc/bus
total 0
-r--r--r-- 1 root root 0 Oct 29 16:01 i2c
-r--r--r-- 1 root root 0 Oct 29 16:01 i2c-0
dr-xr-xr-x 10 root root 0 Oct 29 16:01 pci
dr-xr-xr-x 1 root root 0 Oct 22 22:07 usb
>
> There must be at least one or more drivers loaded to provide
> connections to at least one or more hardware monitor chips on
> the motherboard. Then /etc/sensors.conf is configured to use
> the data made available by the driver(s). So two questions come
> up:
>
> 1) What hardware monitoring chips are available, and
> 2) How to configure /etc/sensors.conf to see them.
>
> Neither of those questions are necessarily as straight forward as
> they may seem!

My motherboard is SuperMicro X5DP8-G2.

http://www.supermicro.com/products/motherboard/Xeon/E7501/X5DP8-G2.cfm

$ cat /proc/modules
w83781d 22672 0
i2c-proc 9232 0 [w83781d]
i2c-isa 1836 0 (unused)
i2c-core 19300 0 [w83781d i2c-proc i2c-isa]

modules seem to be loaded (boring ones omitted).

$ locate w83781d
/home/jdbeyer/Computer/w83781d.pdf
/usr/src/linux-2.4.21-20.EL/drivers/sensors/w83781d.c
/usr/src/linux-2.4.21-20.EL/drivers/sensors/w83781d.o
/usr/src/linux-2.4.21-20.EL/drivers/sensors/.w83781d.o.flags
/usr/src/linux-2.4.21-20.EL/include/config/sensors/w83781d.h
/usr/src/linux-2.4.21-20.EL/include/config/sensors/w83781d
/usr/src/linux-2.4.21-20.EL/include/config/sensors/w83781d/module.h
/usr/share/doc/lm_sensors-2.6.5/doc/chips/w83781d
/lib/modules/2.4.21-20.ELsmp/kernel/drivers/sensors/w83781d.o

>
> For example, if that motherboard uses any of the Winbond chips,

it does.

> you might have the w83681d module loaded, and there will be a
> w83681d directory in /sys/bus/i2c/drivers. And in that
> directory (if and only if the module is correctly configured
> when it is loaded) will be symlinks to one or more other
> directories. These will be named according to the i2c bus
> address, as different functions on the same chip may use
> different addresses. There might be multiple different chips
> too!
>
> The problem then is to correlate the data files under each
> directory with a specific measurement on the motherboard. The
> *only* accurate way that can be done is with information from
> the board manufacturer. But of course a few educated guesses
> can be useful too!
>
> Hence, if a given setup for lm_sensors is only measuring two
> fans and two temperatures, but there are data files for four of
> each and the BIOS is reporting four... the right configuration
> for modules and for /etc/sensors.conf will provide more
> information.

Two fans, three temperatures.
>
> That of course assumes the process the lm_sensors package
> recommends for discovery has worked, and that the correct
> drivers are loaded and able to detect all of the hardware
> monitoring chips. That is *not* guaranteed to happen though! I
> have a couple of Tyan S2462 SMP boards using AMD Athlon chips.
> The documentation clearly states that it has a W83682D hardware
> monitor chip *and* a W83627HF Super I/O chip that can also do
> hardware monitoring. The BIOS reads all sorts of things, yet
> lm_sensors does not normally detect the W83627HF chip! (When
> the board first came out there was no information available
> indicating which data was what, and everyone using lm_sensors
> had the temperature probes labeled wrong. Later Tyan released
> the information and made life easier.)
>
> It turns out that if the BIOS is used to look at the hardware
> monitor during the boot process the W83627HF hardware monitor
> sections of the chip are enabled and initialized, and then
> lm_sensors can see it. Buy a power cycle will make it disappear
> again.
>
> I had to write a utility to initialize the W83627HF chip
> at boot time.
>
> http://web.newsguy.com/floyd_davidson/code/sensors/w83627hf/
>
> That allows lm_sensors to report everything the BIOS does,
> (and more) on a Tyan S2462 motherboard.
>
> Of course I've also discovered that there are differences
> between an early version of the S2462 and a later version.
> A couple of the voltage probes provide different raw data
> for the same voltage, and the ability to control fan speed
> is not there for some fans on the early board.
>
>
>>I have had two "temperature events". At one point, the
>>temperature of the "System" slowly went up about 5 degrees
>>C. Since the room temperature had gotten up to about 90F, I
>
>
> [interesting descriptions snipped]
>
> I found that notable because of the 5 degrees. It seems the
> normal range of variation can be nearly that much apparently
> just from the week to week variations of the probes and the
> chip/circuit used to monitor it.

I have a crontab entry that runs "sensors" every 15 minutes and appends
them to a file that I trim off the old stuff from time to time. The
"System" temperature does not seem to vary much for a constant room
temperature. I did note that the power supply fans, and the processor fans
did speed up considerably (temperature controlled fans) when the main
input fan seized up. I do not think the sensors are drifting all that much.
>
> What I'm doing is plotting graphs of all voltages and
> temperatures. Under the directory "tellerstats" in the
> lm_sensors distribution is an example using gnuplot to generate
> a web page. I've modified that and enhanced it significantly,
> and the results, plus a link to the scripts that produced these
> graphs, are shown at this URL,
>
> http://web.newsguy.com/floyd_davidson/sensors/

I am curious about the 1-hour "spikes" in all the temperatures...
>
> I'm generating 20 some graphs every 10 minutes using 2 minute
> sampling. Each graph shows 48 hours of history.
>
>
>>Fri Oct 29 11:00:00 EDT 2004
>>w83627hf-isa-0290
>>VCore 1: +1.45 V (min = +1.36 V, max = +1.47 V)
>>VCore 2: +3.31 V (min = +3.13 V, max = +3.45 V)
>>+3.3V: +3.23 V (min = +3.20 V, max = +3.45 V)
>>+5V: +4.97 V (min = +4.84 V, max = +5.24 V)
>>+12V: +11.97 V (min = +11.48 V, max = +12.58 V)
>>-12V: -11.83 V (min = -13.06 V, max = -11.41 V)
>>V5SB: +5.43 V (min = +4.84 V, max = +5.24 V)
>>VBat: +3.23 V (min = +2.40 V, max = +3.60 V)
>>CPU0 fan: 2909 RPM (min = 1500 RPM, div = 2)
>>CPU1 fan: 2250 RPM (min = 1500 RPM, div = 2)
>>System: +40C (limit = +45C, hysteresis = +42C) sensor = thermistor
>>CPU0: +52.0C (limit = +60C, hysteresis = +58C) sensor = thermistor
>>CPU1: +51.0C (limit = +60C, hysteresis = +58C) sensor = thermistor
>>vid: +1.400 V
>
>
> That looks pretty good. I'd recommend changing the fan divisor
> to 8 though. You'll get better resolution, particularly at
> lower speeds.

I changed it to 4 for a while. My log has not yet picked it up, but here
is what it normally looks like, for one fan:

CPU0 fan: 2909 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2934 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2909 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2960 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2934 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2909 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2934 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2909 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2934 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2860 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2884 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2909 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2909 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2960 RPM (min = 1500 RPM, div = 2)
CPU0 fan: 2960 RPM (min = 1500 RPM, div = 2)

I am not sure if more resolution is any use, though. OK: here are the
latest readings:

CPU0 fan: 2722 RPM (min = 750 RPM, div = 4)
CPU1 fan: 2192 RPM (min = 750 RPM, div = 4)
System: +40C (limit = +45C, hysteresis = +42C) sensor = thermistor
CPU0: +44.5C (limit = +60C, hysteresis = +58C) sensor = thermistor
CPU1: +43.0C (limit = +60C, hysteresis = +58C) sensor = thermistor

N.B.: I had shut down the 4 setiathome processes, below, during this time
interval, so things may have been slightly cooler.
>
> It appears that you are loading the w83627hf module, and thus
> using the LPC (ISA) bus to access the W83627HF chip. Is that
> necessary, or is it also connected to the i2c bus? The w83781d
> module will provide W83627HF data via the i2c bus. I would
> suspect that if your BIOS is seeing more than lm_sensors you
> might find that there is either a second W83627HF Super I/O chip
> or more likely an additional dedicated hardware monitor chip
> like the W83782D (I.e., the same arrangement Tyan boards use.)
>
> In any case, accessing via the i2c bus is probably preferred,
> even though accessing it can be more complicated (especially
> if there are multiple chips to access).
>
I could not get it to work that way. It was a fight to get them to work at
all. A needed module would not load, and I could not figure out what
parameters it would need to get it to load, so I gave up on that.
>
>>The processors do not seem to go over 55C even when the fan
>>speeds double. What I think interesting is that the design of
>>the fans is "wrong"; the temperature sensing thermistor is at
>>the fan intake. It seems to me it would be smarter, though
>>technically difficult, to measure the air temperature at the
>>output of the processor wind tunnel, giving the fan some idea
>>how hot the processors were. But they seem to do the right
>>thing, and do it quite well, as they are.
>
>
> I assume you are talking about the "System" temperature, as
> opposed to the CPU temps. The fan speed should be controlled by
> the cpu temperature. The exhaust temperature is of little
> value...
>
The fans are 60mm x 38mm fans supplied by Intel. The sensor is in the
usual place for such fans, so they measure the air temperature going into
the fan (and from there to the heat sink), which is probably closely
related to system temperature (I do not know where the sensor is). I agree
that those sensors should measure the CPU temperature, but there is no way
for them to do that. If the sensor were remote located, the next best
thing to measuring CPU temperature would be, IMAO, the air coming out of
the heat sinks. But the measured CPU temperatures does not seem to vary
much as the System temperature goes up. But the CPU fans sure do speed up.
>
>>Another thing I notice is that the power required by modern
>>chips (and by "modern" I mean Pentium IIIs and later) varies
>>depending on the processing load. If I run 4 instances of
>>setiathome on the two hyperthreaded Xeon processors, the power
>>required is noticeably more (say 30% or more) than when I do not
>>and the machine is essentially idling. And the temperature and
>>processor fan speeds do go up.
>
>
> Interesting! I'm not set up to monitor power, and maybe that
> would be useful to add. Right now I have the onboard hardware
> monitoring chips, plus a Crystalfontz CFA-633 temperature monitor
> and fan controller device (a *great* toy!).

The way I measure power consumption is that my APC Smart-UPS 2200 will
give the percent capacity it is running at, and the computer runs at about
23.4% of capacity when four setiathome processes are running (that is what
it is doing now). If I stop them, it drops to 14.5%. So the power changes
A LOT depending on CPU load. Also the temperature of the air coming out
the exhaust fans of the box are noticeably warmer with the processors
fully loaded. I was wrong when I thought it was 30% more; seems it is
about 60% more. Those two 90 watt processors sure take a lot of power. I
think I was right to put a fan in every fan location in the box. The box
is AddTronics 7896A.

http://www.addtronics.com/7896A.htm

[snip]

-- 
   .~.  Jean-David Beyer           Registered Linux User 85642.
   /V\                             Registered Machine   241939.
  /( )\ Shrewsbury, New Jersey     http://counter.li.org
  ^^-^^ 15:55:00 up 6 days, 17:47, 3 users, load average: 4.08, 4.26, 4.21


Relevant Pages

  • Re: My computer restarts at random times? XP HOME
    ... I would say the temperature sensing of your PC is ... Some kinds of fans may stall i.e. reduce capacity, ... I know have an old RAM stick in, and as of yet have had no BSOD's.. ... Windows should really have been able to ...
    (microsoft.public.windowsxp.perform_maintain)
  • Re: CPU temp -- LONG
    ... I don't remember now the exact setup for a 2.4 kernel. ... It doesn't mention the chip nor does it ... there are 6 fans with tachometer monitoring. ... >(temperature controlled fans) when the main input fan seized ...
    (comp.os.linux.misc)
  • Re: 97 Intrepid overheating
    ... it the fans will work. ... refilled it and bleed the air out. ... am starting to believe there may be a head gasket leak blowing air into the ... turning the air conditioning on over-rides the temperature ...
    (rec.autos.makers.chrysler)
  • Re: variable fans
    ... I've seen one module that lets you control the fans and/or give ... If you only want to read the temperature, your lowest cost option is to ... One sensor is mounted in the display unit, ... of stagnant air where the fans aren't reaching. ...
    (comp.os.os2.misc)
  • Re: Filtering diesel fumes at case intake?
    ... >>switches, fiber optics, RFID, UPS, and other equipment inside it. ... >>which started having failures when the temperature rose to 126F. ... >>filtering, diesel fumes enter into the case past our filters. ... If you remove the external fans and install internal fans that move ...
    (comp.sys.ibm.pc.hardware.misc)