Re: How can Linux demage a motherboard?
- From: floyd@xxxxxxxxxx (Floyd L. Davidson)
- Date: Fri, 20 Apr 2007 14:38:52 -0800
"Tom Szabo" <tom@xxxxxxxxxxxxxxxx> wrote:
....
work on either. But when it loaded cpuspeed the power driver
is hard coded in the /etc/cpuspeed.conf file on most Linux's
so it was likely trying to load an inappropriate power
module. Also, /etc/modprobe.conf might be suggesting modules
to the kernel for loading that are really for Intel
What you are saying regarding the powermanagement is in line with my
thoughts, I suspected something on this line.
That is almost certainly the general idea of what is happening.
Try installing linux on the Motherboard from scratch or only use an image
on a hard disk from another server that's got exactly the same hardware.
Another easy test would be to try a Live Linux CD like Knoppix and see
if it reproduces the weird fan problem, I bet it won't.
On the other hand, in your suggestion you have forgotten the fact that now
the server doesn't run for more that 5 five minutes in first go after couple
of hours of resting, and the second and subsequent startups are only a few
seconds.....
He is correct. Although your attempts at running a system
configured for another motherboard might prevent you from easily
doing that now.
Here is my real dilemma:
The OS loaded some driver, module, etc and changed some behaviour.
Almost certainly it has to do with controlling the fans, which
makes it almost guaranteed to be lm_sensors related. The
potential difficulty is that you have now written a
configuration to the chip which monitors temperatures, and if so
it that has to be cleared in order to prevent the faulty
temperature shutdown.
The exact nature of the problem appears to be that you have
either selected the wrong type of temperture tranducer, or given
it a very low temperature as the alarm point, and the monitor
chip thinks it is overheating when in fact it is not even at
normal temperature. When it cools for a couple hours, it
actually gets down to room temp and takes a significant amount
of time to heat up. In the process the fans go through each
stage of control from barely on to hitting it full blast. Then
it shuts down. Of course if it is immediately restarted it
takes much less time to hit the alarm temp.
Have you tried to boot to single user? I'm not sure how that is
done on your particular machines, but with the LILO boot loader
you would give it a boot name, such a "linux", and add the word
"single" after it.
If the configuration is not being written to the monitor chip
and if it is properly programmed into the boot scripts, it will
not be done when booting to single user mode (just because an
error in the script would prevent booting).
If you try that and it still does not, try booting from any
other kernel, such as a live CD (for any system) and see if that
will continue to run.
If you get it to run without shutting down, use whatever system
you've boot as a "rescue system" and edit the boot scripts for
your improper configuration to remove anything than initializes
sensors.
If no matter what kernel you boot it shuts down, you've got
yourself a *major* problem! You will have to figure out the
right configuration for your server, set it up on another
system, and then boot it with a cold box that will run long
enough for that part of the boot process to be executed and
reconfigure the sensors. It won't be an easy thing to figure
out.
That is
understandable but that should only affect the server when the OS is
loading.
Here on the other hand, once the wrong image is booted up the first time the
problem becomes permanent and not dependent on the OS any more, so tha
changes are written some where into/onto the motherboard.....that is fine, I
can leave with that too.
But where does it gets written to so neither the BIOS reset or the bios
patch can reverse it?
It does appear to be the configuration of the sensor monitoring
chip.
Another trick you might try is a reset, and instead of letting
it boot the OS, go to the BIOS setup. In the BIOS setup do
whatever is available for monitoring (temperatures, voltages,
etc.). As an example, on some older Tyan dual processor boards
for AMD processors the sensor monitoring chip would be about
half initialized in a normal boot, but would be fully
initialized only when the BIOS setup option to show voltages and
temperatures was entered. If the reset button was pressed after
that, then lm_sensors could monitor all of the temps and
voltages. But if the box was powered down it would reset the
chip completely, and only half of it was initialize when powered
up. That meant half the temperatures and voltages could not be
read by lm_sensors. (I wrote a C program to fully initialize
the monitor chip, and ran that as part of the boot process to
correct the problem.)
That monitor chip did not retain configuration when powered
down, as it seems yours is.
Considering all simptoms, it seems like some sensory process gets adjusted
to just below the normal oprating temperature.
For example the CPU operates normally @ 50 degrees and normally the max
operating temperatue is 80 degrees.
Using this examplle, my stuffed image somehow managed to adjust the max
operating temperature to 49.8 degrees. When I turn on the machine the first
time, it takes a little while to get up to the 50 degrees, so I have say 5
minutes...but next roung only 10 seconds, and after that it is down to 1 or
2 seconds.
Exactly.
Any more clues?
I'm glad I'm not you... ;-)
--
Floyd L. Davidson <http://www.apaflo.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska) floyd@xxxxxxxxxx
.
- Follow-Ups:
- Re: How can Linux demage a motherboard?
- From: Tom Szabo
- Re: How can Linux demage a motherboard?
- From: Tom Szabo
- Re: How can Linux demage a motherboard?
- References:
- How can Linux demage a motherboard?
- From: Tom Szabo
- Re: How can Linux demage a motherboard?
- From: Mark
- Re: How can Linux demage a motherboard?
- From: Tom Szabo
- How can Linux demage a motherboard?
- Prev by Date: Re: How can Linux demage a motherboard?
- Next by Date: Re: How can Linux demage a motherboard?
- Previous by thread: Re: How can Linux demage a motherboard?
- Next by thread: Re: How can Linux demage a motherboard?
- Index(es):