Re: How can Linux demage a motherboard?
- From: "Tom Szabo" <tom@xxxxxxxxxxxxxxxx>
- Date: Sat, 21 Apr 2007 13:38:45 +1000
Hi Floyd,
Thanks for your reply, I am glad to see that I am not alone :-), your
thoughts, almost all, make sense.
The only problem I have still, that I have another install of Linux, that
was donee on these machines and it worked fine. This is the first "image" I
used in the 3rd machine.
This worked fine in the machines for hours, until I have put the second
image in, and than the problem started. As far as I remember, I have tryed
to use the first image and booted, but I could be little mixed up, so I will
try again and let you know.
As far as I can see, some code in Image #2 has changed something in the
machine. If it was part of a normal adjustment, it should be reversed by a
correct image. I hope you are right and I rememer wrong, and in fact I never
got to boot proerly with the first image after the secod image run on the
same machine....that is my hope...
Otherwise it is a little trouble, as than it means that there was some kind
of wierd code that went a little further than should have and changer
something that a normal install doesn't mean to...
Anyway, Iwill do the test and see,
Thanks for your thoughts,
Regards,
Tom
"Floyd L. Davidson" <floyd@xxxxxxxxxx> wrote in message
news:87647qekn7.fld@xxxxxxxxxxxxx
"Tom Szabo" <tom@xxxxxxxxxxxxxxxx> wrote:image
...
work on either. But when it loaded cpuspeed the power driver
is hard coded in the /etc/cpuspeed.conf file on most Linux's
so it was likely trying to load an inappropriate power
module. Also, /etc/modprobe.conf might be suggesting modules
to the kernel for loading that are really for Intel
What you are saying regarding the powermanagement is in line with my
thoughts, I suspected something on this line.
That is almost certainly the general idea of what is happening.
Try installing linux on the Motherboard from scratch or only use an
hardware.on a hard disk from another server that's got exactly the same
nowAnother easy test would be to try a Live Linux CD like Knoppix and see
if it reproduces the weird fan problem, I bet it won't.
On the other hand, in your suggestion you have forgotten the fact that
couplethe server doesn't run for more that 5 five minutes in first go after
fewof hours of resting, and the second and subsequent startups are only a
theseconds.....
He is correct. Although your attempts at running a system
configured for another motherboard might prevent you from easily
doing that now.
Here is my real dilemma:
The OS loaded some driver, module, etc and changed some behaviour.
Almost certainly it has to do with controlling the fans, which
makes it almost guaranteed to be lm_sensors related. The
potential difficulty is that you have now written a
configuration to the chip which monitors temperatures, and if so
it that has to be cleared in order to prevent the faulty
temperature shutdown.
The exact nature of the problem appears to be that you have
either selected the wrong type of temperture tranducer, or given
it a very low temperature as the alarm point, and the monitor
chip thinks it is overheating when in fact it is not even at
normal temperature. When it cools for a couple hours, it
actually gets down to room temp and takes a significant amount
of time to heat up. In the process the fans go through each
stage of control from barely on to hitting it full blast. Then
it shuts down. Of course if it is immediately restarted it
takes much less time to hit the alarm temp.
Have you tried to boot to single user? I'm not sure how that is
done on your particular machines, but with the LILO boot loader
you would give it a boot name, such a "linux", and add the word
"single" after it.
If the configuration is not being written to the monitor chip
and if it is properly programmed into the boot scripts, it will
not be done when booting to single user mode (just because an
error in the script would prevent booting).
If you try that and it still does not, try booting from any
other kernel, such as a live CD (for any system) and see if that
will continue to run.
If you get it to run without shutting down, use whatever system
you've boot as a "rescue system" and edit the boot scripts for
your improper configuration to remove anything than initializes
sensors.
If no matter what kernel you boot it shuts down, you've got
yourself a *major* problem! You will have to figure out the
right configuration for your server, set it up on another
system, and then boot it with a cold box that will run long
enough for that part of the boot process to be executed and
reconfigure the sensors. It won't be an easy thing to figure
out.
That is
understandable but that should only affect the server when the OS is
loading.
Here on the other hand, once the wrong image is booted up the first time
fine, Iproblem becomes permanent and not dependent on the OS any more, so tha
changes are written some where into/onto the motherboard.....that is
adjustedcan leave with that too.
But where does it gets written to so neither the BIOS reset or the bios
patch can reverse it?
It does appear to be the configuration of the sensor monitoring
chip.
Another trick you might try is a reset, and instead of letting
it boot the OS, go to the BIOS setup. In the BIOS setup do
whatever is available for monitoring (temperatures, voltages,
etc.). As an example, on some older Tyan dual processor boards
for AMD processors the sensor monitoring chip would be about
half initialized in a normal boot, but would be fully
initialized only when the BIOS setup option to show voltages and
temperatures was entered. If the reset button was pressed after
that, then lm_sensors could monitor all of the temps and
voltages. But if the box was powered down it would reset the
chip completely, and only half of it was initialize when powered
up. That meant half the temperatures and voltages could not be
read by lm_sensors. (I wrote a C program to fully initialize
the monitor chip, and ran that as part of the boot process to
correct the problem.)
That monitor chip did not retain configuration when powered
down, as it seems yours is.
Considering all simptoms, it seems like some sensory process gets
firstto just below the normal oprating temperature.
For example the CPU operates normally @ 50 degrees and normally the max
operating temperatue is 80 degrees.
Using this examplle, my stuffed image somehow managed to adjust the max
operating temperature to 49.8 degrees. When I turn on the machine the
5time, it takes a little while to get up to the 50 degrees, so I have say
orminutes...but next roung only 10 seconds, and after that it is down to 1
2 seconds.
Exactly.
Any more clues?
I'm glad I'm not you... ;-)
--
Floyd L. Davidson <http://www.apaflo.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska) floyd@xxxxxxxxxx
.
- Follow-Ups:
- Re: How can Linux demage a motherboard?
- From: Floyd L. Davidson
- Re: How can Linux demage a motherboard?
- References:
- How can Linux demage a motherboard?
- From: Tom Szabo
- Re: How can Linux demage a motherboard?
- From: Mark
- Re: How can Linux demage a motherboard?
- From: Tom Szabo
- Re: How can Linux demage a motherboard?
- From: Floyd L. Davidson
- How can Linux demage a motherboard?
- Prev by Date: Re: Another "What Motherboard Should I Buy" question
- Next by Date: Re: How can Linux demage a motherboard?
- Previous by thread: Re: How can Linux demage a motherboard?
- Next by thread: Re: How can Linux demage a motherboard?
- Index(es):