Re: Generate NMI to crash a hung system...
- From: The Natural Philosopher <a@xxx>
- Date: Mon, 25 Sep 2006 12:19:18 +0100
big_sid wrote:
Hello,If its crashed that badly, it may well no longer be able to dump anything to a file system.
We have a numbr of Intel ProLiant servers running RedHat Enterprise
Linux 3. From time to time, and for no apparent reason these boxes just
hang. We can't SSH to them, but they still respond to a ping. We connect
to the iLO and attempt to login to the console, but after putting in the
username and password again it just hangs and won't actually give a
command prompt.
In an effort to try and figure out why this is happening, we need to
force a hung box in this state to perform a crash dump so we can send
it off to whoever for some analysis.
So, I setup a netdump-server and setup the server which crashes as a
client. Tested it and it all worked OK. I modified the kernel parameter
kernel.unknown_nmi_panic to 1 from 0 so when I sent it an NMI it should
die on it's arse and give me a nice dump...
However, the box went this morning. So I logged onto the iLO, generated
an NMI, but all it did was dump a couple of log lines onto the netdump
server, no actual crash dump was produced.
Have I missed anything out here? I did this on another couple of RHEL3
test boxes and got a lovely big vmcore file of about 4 gig on my
netdump server, but I'm getting nothing on the server I actually WANT a
crashdump from.
Thanks in advance - Lee
In really bad cases we used to use a hardware emulator instead of the actual processor chip...however that seems very expensive.
I suspect its run out of resources somewhere. Or is locked in a processor loop. Ping response OK shows that kernel interrupts are at least happening..my guess is it can't fork a process though..its run out of something.
I've seen this behaviour on older UNIX boxes with process limits on them. Usually you get an 'err: fork failed: too many processes' or somesuch, or used to.
Not much help to you, but maybe will help get the brain started on a useful track.
We used to leave a root login running on the console. Sometimes a ps or top would work and show us what was clogging it up.
You might do worse than to write a cron script that dumps stuff out and see what was happening PRIOR to the crash.
.
- Follow-Ups:
- Re: Generate NMI to crash a hung system...
- From: spike1
- Re: Generate NMI to crash a hung system...
- References:
- Generate NMI to crash a hung system...
- From: big_sid
- Generate NMI to crash a hung system...
- Prev by Date: Generate NMI to crash a hung system...
- Next by Date: CVS setup ..
- Previous by thread: Generate NMI to crash a hung system...
- Next by thread: Re: Generate NMI to crash a hung system...
- Index(es):
Relevant Pages
|