Re: Generate NMI to crash a hung system...
- From: The Natural Philosopher <a@xxx>
- Date: Tue, 26 Sep 2006 17:20:30 +0100
big_sid wrote:
Hi,
Thanks for all of the replies. I'm more or less convinced that the
cause of our systems hanging is a combination of buggy kernels
(RHEL-AS-3-U3 2.4.21-20 or thereabouts) and a depletion of system
memory resulting in the hang (most of these servers are running Oracle
9 and 10 DBs) or Middleware type appservers (WebSphere etc).
Ah. That was percisely where I had to tune My unix years ago.Informix Database stuff. Dozens of processes and zillions of open files in a C-ISAM setup.
I would definitely investigate how many open files and how many processes these boxes are running. With a once every few seconds cron script.
If you find that no box ever exceeds a suspicious looking process limit - like 4096, or ever has more than a similar suspicious number of files open, you will know where to probe deeper.
I cannot say that ill behaviour in a system that is outside the limits it has been set is a 'bug' though.
However, the particular server I'm talking about DID write something to
the netdump server - it chucked out some logs. Normally we'll get a
section of appropriate console ouput put into a file called
/var/crash/<client_ip_date_time/log on the Netdump server. However we
should have also got a vmcore or at least a vmcore-incomplete which is
the memory dump that we were after. It was this that didn't get dumped
and I'm wondering if there is something I had to configure beforehand
on the netdump client to allow it to perform the full memory dump.
My guess is that to dump it required a tad more resources than you had left.
I honsetly thimk you are looking in the wrong area. Yes, a kernel that e.g. tries to fork and gets a null response from a memory allocation request shouldn't bomb, but in practice this isn't the issue you are trying to fix.
You are trying to make sure it HAS got enough memory to e.g. fork.
Fixing bugs in error REPORTING doesn't fix the problems that caused the errors..
Thanks - Lee.
- References:
- Generate NMI to crash a hung system...
- From: big_sid
- Re: Generate NMI to crash a hung system...
- From: The Natural Philosopher
- Re: Generate NMI to crash a hung system...
- From: spike1
- Re: Generate NMI to crash a hung system...
- From: The Natural Philosopher
- Re: Generate NMI to crash a hung system...
- From: big_sid
- Generate NMI to crash a hung system...
- Prev by Date: LVM - can both CKD and FBA reside in same volume group?
- Next by Date: BCRaid progress
- Previous by thread: Re: Generate NMI to crash a hung system...
- Next by thread: Re: Generate NMI to crash a hung system...
- Index(es):