Re: How to diagnose kernel panic?



Hi Mark.

I don't know if this kind of information will help out at all or not, but
what are the specs of your machine? Specifically, do you have a quality power
supply? How about your hard drive and your motherboard? As I said, I don't
know if answering these questions will reveal anything important, but it
always helps to verify that you are using quality parts in your machine.

After all, a software program is just a collection of assembly instructions
to your CPU (usually compiled from a high-level language, such as C++). If a
piece of software executes an assembly instruction that addresses a hard disk
for information and if the motherboard and/or the hard disk are cheapies and
they fail to properly return whatever data the assembly instruction was
expecting, that certainly cause software bugs ranging from incorrect display
of data to kernel panics, depending on the program that gets lucky (cheap
motherboards and hard disks are cheap because they have less redundancy,
fault-tolerance, and use components more likely to fail to begin with). Also,
if your power supply is a cheap one, it might not be supplying enough power
to your computer and if that happens, well, your computer just won't work
correctly because both your software and hardware expect full power in order
to work correctly.

Hope all that helps.


On Sunday 09 July 2006 12:56, Mark Copper wrote:
I have a server that is brought down by a kernel panic every two weeks
on average. Nothing untoward gets in the logs and the on-screen panic
message starts with something like
Kernel panic - not syncing: Fatal exception in interrupt

Call trace:
[<c026bc42>] scsi_request_fn+0xf610x294
I wasn't able to get any more at the data center...

So I brought the machine home and am running folding@home on it and so
far I have not been able to induce the panic. The replacement machine
is similar, but not identical. The main difference being a switch from
software to hardware RAID1. Also, the new machine, except for the
hardware driver, uses stable while the problematic machine uses testing.
And the replacement has run so far without problem.

The only other thing I can add is that the bad machine would seem to
start getting "sluggish" before it froze, but for the life of me, I
couldn't see why.

I am posting because I'm hopeful that list participants might have
suggestions how I might start to chase down or, better yet, eliminate
this problem.

Is there a way, perhaps, to manufacture the possible interrupts that
occur?

Thanks.

Mark


--
To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx
with a subject of "unsubscribe". Trouble? Contact listmaster@xxxxxxxxxxxxxxxx



Relevant Pages

  • Re: Power Schemes
    ... They want to build quality products to keep a good reputation for future ... I now have to push to power button. ... Start/Stop Count and Power Cycle Count, and the overall performance of the ... your hard disk Power Cycle Count attribute current value is ...
    (microsoft.public.windowsxp.general)
  • Re: Can a hard drive be physically damaged due to power loss at startup time?
    ... power cord became unplugged from the transformer. ... reboot I got a message about a second partition, ... INSTALL WINDOWS 98ON YOUR COMPUTER AN ERROR WAS DETECTED WHILE TRYING ... TO READ OR WRITE TO YOUR HARD DISK ...
    (alt.comp.hardware.pc-homebuilt)
  • Can a hard drive be physically damaged due to power loss at startup time?
    ... I have a laptop whose main battier is dead. ... power cord became unplugged from the transformer. ... INSTALL WINDOWS 98ON YOUR COMPUTER AN ERROR WAS DETECTED WHILE TRYING ... TO READ OR WRITE TO YOUR HARD DISK ...
    (alt.comp.hardware.pc-homebuilt)
  • Re: Power Schemes
    ... standby cause more negative effects than reverting to my old power scheme and ... using the power button for wake up. ... Start/Stop Count and Power Cycle Count, and the overall performance of the ... your hard disk Power Cycle Count attribute current value is ...
    (microsoft.public.windowsxp.general)
  • Re: Why is it not possible to ctr-alt-delete out of a kernel panic?
    ... memory, I/O and other resources. ... and often this means that a forced power off is ... I've only ever seen kernel panics caused by device drivers. ...
    (uk.comp.sys.mac)