Re: How can Linux demage a motherboard?



On Sat, 21 Apr 2007 05:23:27 +1000, Tom Szabo wrote:

Hi,
I have recently purchased a few servers, AMD Opteron based. I have
installed Linux with the 2.6 kernel and started configuring. I have run
into some confguration problems so after a few hours of changing setting
and installing components I have abandoned the installed image and
decided to plug in a different drive with an image that was installed on
a different (Intel based) server. I have started up the server and it
booted as normal, except the server that previously run quietly during
the boot process started to get noisy, the fans started to work harder
halfway through the bootup process, after a while the fans started to go
run flat out for a few seconds and the system decided to initialise a
system shutdow, and shutdown without a question.

I have restarted the server and this time the server only run for about
10-15 seconds with increasing fan speed and shut down again.

On the third try the server only ran for a couple of secod, the fans
just shad enough time to speed up, and the system shut down straight
away. i have tried again and again, but no change.

I check the CPU and all components for heat, non was overheating, all
seemed fine.

I had the second server unpacked and started using it. After pluging in
the "second" image, that was created on s different system the server
started the same behaviour and became unusable. At this stage I had no
idea what is causing the problem. I had the two servers to rest, and
after a few hours of rest I did a few tests.

After "cooling off", the servers will turn on and operate for a few
minutes, but once they do the first shutdown as previously did, they
will only operate for another 10-15 second, and any subsequent attempt
of fireing them up will last for a seccond or two. Again lleaving them
for a few hour allows too get them goind for a few minutes.

I have called the manufacturer and explained the situation, at this
stage I was not aware of the affect of the second image, going through
the simptom together with the tech guy on the other end of the line, we
ame to the conclusion that this is a MotherBoard problem and they are
dspatching one for me.

In the evening I decided to unpack the third server and using the "first
image" I started to continue the abandoned configuration. Another 2
hours went past using the 3rd server without any problem, but I could
not get the ystem configured the way I needed so plugged in the "second
image" and booted up the 3rd server again. During the boot while the
server was going through the different run levels I heard the fans
speeding up and somewhat fluctuating. I suddenly realises the situation
and understood that this this problem only started on all 3 servers when
I booted up with this image, I halted the boot process, but it was too
late. After unpluggging the drive from the 3rd machine and trying to
reoot again, the server run for a few minutes and shut down as the
other 2 did previously.

Here it became clear that Linux has made some changes, altered something
in the bios or who knows where and I have to reset watever has changed.

I have tried to reset the bios with the dip switch, in the bios set back
to default and removed the power cable from the motherboard together
with the bios battery. Left it rest for a little while, started up
again, but no change.

I have even managed to update the bios hoping to overwrite any changes,
but no affect. The servers still behave the same way. Have plugged in a
different powersupply, still made no difeerence.

My conclusion at this stage is that some sensor or other mechanism has
been adjusted or demage by the OS, and none of the above attempts to
reset it has worked!!!

What could it be? How can an operating system cause such irreversible
change??

TIA,

Tom

You haven't said anything about which distro or kernel you are using. If
it's an old kernel then it might not support your motherboards. RHEL 4
uses a very old kernel, 2.6.9, which won't support most modern machines.
I don't know what the official means of upgrading a RHEL 4 kernel is,
I've only used the clones, but I had to put a 2.6.18 kernel on my A64
laptop before I could get Scientfic Linux 4.4 to work reliably. FC6 uses
the most recent kernel, 2.6.20,xx, so it will work on your systems. Also
RHEL 5 uses a 2.6.18 kernel which will probably work.
.



Relevant Pages

  • systematic panic on an SMP machine for 5.1-Release
    ... backup from the boot server on the local hard disk). ... I first got the following panic when installing the sources from the ... I got it too when running the SMP kernel copied from a nearby dual board (for ... GNU gdb 5.2.1 ...
    (freebsd-current)
  • NFS problems with through 2.5.x to 2.6.0-test9
    ... When the server is running the ... kernel, as a client the 2.6 series seem to work perfectly, excluding ... Interesting problem arose when I attempted switch the server's kernel to ... with and without nfsv4 support compiled in (was considering testing it at ...
    (Linux-Kernel)
  • [Summary] SunRay server failure
    ... SunRay Server Software 1.3 ... Kernel: panic: AutoRenewDHCP: IPA lease expired -- must restart ...
    (SunManagers)
  • Re: NFS EINVAL on open(... | O_TRUNC) on 2.6.23.9
    ... The bug (userspace server side i would say at this point) is well described from the author of an nfs-user-server patch which has not been managed yet. ... The nfs patch is of course waiting for commit since august, ... What isn't quite clear to me is whether this commit causes your user- space server to start failing suddenly, or it causes the client to start sending the special non-standard time stamps in the SETATTR request. ... it would be helpful if you could run this test with a constant kernel version on one side while varying it on the other. ...
    (Linux-Kernel)
  • error while installing on-board NIC on Proliant 1600 server in SCO Open Server 5.0.5
    ... While installing the EFS, when I configure the TCP/IP then i get the ... messages while relinking the kernel. ... Print Server in SCO Open Server 5 i rebooted the server Compaq Proliant ...
    (comp.unix.sco.misc)