Re: lm_sensors inaccurately reporting sudden changes?

From: Floyd L. Davidson (floyd_at_barrow.com)
Date: 03/01/05


Date: Tue, 01 Mar 2005 05:02:08 -0900


"squishymedia" <squishymedia@gmail.com> wrote:
>John-Paul, thanks for the pointers on OpenManage. I've cobbled my own
>monitoring system together on this Slackware box (that's how I found
>out about the lm_sensors alert) but if I can get OpenManage running
>that'll probably be much more robust.
>
>On the chassis intrusion alarm: all the ALARM status messages continued

The alarms will remain until they have been cleared by actually
reading the sensor chip, which may take more than one effort to
clear them all.

>to be reported by 'sensors' until rebooting; about 48 hours. The colo
>facility reviewed access logs and cameras and didn't see any physical
>access to the box, which makes the chassis intrusion sound like a bit
>of a fluke.

Given the numbers in you original post there are a couple of
obvious things. First, the configuration isn't correct to begin
with, but it is also clear that something rewrote the registers
in the sensor chip.

It probably could have been cleared by reconfiguring the sensor
chip with "sensors -s", and then reading the sensor chip a
couple of times with just "sensors".

I can't begin to guess what might have overwritten the sensor
chip's registers. If it happens again you might want to change
your data collection script to reconfigure the chip first, and
then read it at least a couple times to clear alarms before
assuming you have valid data.

A couple of things that I've noticed about using lm_sensors:

  1) There are parsing bugs when reading the configuration file,
     which apparently corrupts the stack used by the parser.

          set in0_min 0.95 * vid
          set in0_min (0.95 * vid)

     Both of the above are valid, and both will work. But what
     I found was that too many expressions without parenthesis,
     and strange things started happening... :-)

  2) Some/Many (All??) motherboards apparently do not have
     adequate noise filtering at the voltage inputs to the
     sensor chip. My experience has been that *every* system
     I've tried reports voltage alarms, but never shows a
     voltage reading that is out of specs. It appears that
     noise spikes trigger the alarm, which is latched, but the
     chances that a spike will occur at the exact moment the
     registers are read is small if not zero.

     Unfortunately lm_sensors is not robust enough to deal with
     bogus transient alarms, and therefore I gave up on ever
     using it as a real time monitor with the potential to
     shutdown the system.

Due to item 2 above, I use lm_sensors to generate a history,
which is graphed and automatically updated on my intranet web
page. At any time I can view a page full of graphs showing
voltages and temperatures over the past 48 hours. It is
definitely entertaining, though I'm not sure yet if it is
actually useful...

Here is a web page showing an example set of graphs, with a link
to the scripts that generate the web page.

  http://web.newsguy.com/floyd_davidson/sensors/

-- 
Floyd L. Davidson           <http://web.newsguy.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska)                         floyd@barrow.com