spontaneous reboot
- From: gelbeiche <info@xxxxxxxxxxxxxxx>
- Date: Tue, 7 Nov 2006 21:17:13 +0100
Hi,
we have an AMD Opteron machine with Linux as OS
(uname -a gives
Linux <machine-name> 2.6.10 #9 SMP Sat May 6 11:32:04 CEST 2006 x86_64
x86_64 x86_64 GNU/Linux).
The machine serves as database server (Oracle) and under heavy load it
comes sometimes to spontaneous reboots which is very annoying and not
acceptable. We have to find the exact reason for the reboot.
I only know hardware defects as reasons for reboots. My first question
is: Is it possible that the machine reboots without having a hardware
defect but only because it is e.g out-of-memory ?
If so, which preconditions must be fulfilled (some BIOS configurations,
e.g. turn off if cpu temperature exceeds a certain limit)?
From my own experiences with such situations I did never observe that
Linux did reboot without pushing the Reset-button.
In the mentioned case I think the swap space is not good configured.
The machine has 13GB RAM but only 1GB swap. But could this a reason
for spontaneous reboot ?
I list now an excerpt form last /var/log/messages when the the reboot
happened. I really would like to understand the details of these lines.
Any comment for these lines is welcome.
The machine has 4 processors.
Oct 19 18:12:21 cust-db kernel: cpu 0 hot: low 32, high 96, batch 16
Oct 19 18:12:21 cust-db kernel: cpu 0 cold: low 0, high 32, batch 16
Oct 19 18:12:21 cust-db kernel: cpu 1 hot: low 32, high 96, batch 16
Oct 19 18:12:21 cust-db kernel: cpu 1 cold: low 0, high 32, batch 16
Oct 19 18:12:21 cust-db kernel: cpu 2 hot: low 32, high 96, batch 16
Oct 19 18:12:21 cust-db kernel: cpu 2 cold: low 0, high 32, batch 16
Oct 19 18:12:21 cust-db kernel: cpu 3 hot: low 32, high 96, batch 16
Oct 19 18:12:21 cust-db kernel: cpu 3 cold: low 0, high 32, batch 16
Oct 19 18:12:21 cust-db kernel: Node 0 HighMem per-cpu: empty
Oct 19 18:12:21 cust-db kernel:
Oct 19 18:12:21 cust-db kernel: Free pages: 31704kB (0kB HighMem)
Oct 19 18:12:21 cust-db kernel: Active:1822154 inactive:1350264 dirty:0
writeback:0 unstable:0 free:7926 slab:15758 mapped:2368741
pagetables:74227
Oct 19 18:12:21 cust-db kernel: Node 3 DMA free:0kB
min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 3 Normal
free:6784kB min:6812kB low:8512kB high:10216kB active:1761648kB
inactive:1243200kB present:3145724kB pages_scanned:5338889
all_unreclaimable? yes
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 3 HighMem free:0kB min:128kB
low:160kB high:192kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 2 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 2 Normal
free:9064kB min:9088kB low:11360kB high:13632kB active:2961296kB
inactive:1020044kB present:4194300kB pages_scanned:57564851
all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 2 HighMem free:0kB min:128kB low:160kB high:192kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 1 DMA free:0kB min:0kB low:0kB high:0kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 1 Normal free:7808kB min:6812kB low:8512kB high:10216kB active:1702492kB
inactive:1128272kB present:3145724kB pages_scanned:0 all_unreclaimable?
no Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 1 HighMem free:0kB min:128kB low:160kB high:192kB
active:0kB inactive:0kB present:0kB pages_scanned:0 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 0 DMA free:32kB min:32kB low:40kB
high:48kB active:5220kB inactive:6156kB present:16384kB
pages_scanned:143365 all_unreclaimable? yes
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 0 Normal free:8016kB min:6780kB
low:8472kB high:10168kB active:857576kB inactive:2003912kB
present:3129340kB pages_scanned:444352 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0
Oct 19 18:12:21 cust-db kernel: Node 0 HighMem free:0kB min:128kB
low:160kB high:192kB active:0kB inactive:0kB present:0kB
pages_scanned:0 all_unreclaimable? no
Oct 19 18:12:21 cust-db kernel: protections[]: 0 0 0 Oct 19 18:12:21 cust-db kernel: Node 3 DMA: empty
Oct 19 18:12:21 cust-db kernel: Node 3 Normal: 0*4kB 2*8kB 1*16kB
1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 1*4096kB =
6784kB
Oct 19 18:12:21 cust-db kernel: Node 3 HighMem: empty
Oct 19 18:12:21 cust-db kernel: Node 2 DMA: empty
Oct 19 18:12:21 cust-db kernel: Node 2 Normal: 0*4kB 3*8kB 1*16kB 0*32kB 1*64kB 0*128kB 1*256kB
1*512kB 0*1024kB 0*2048kB 2*4096kB = 9064kB
Oct 19 18:12:21 cust-db kernel: Node 2 HighMem: empty
Oct 19 18:12:21 cust-db kernel: Node 1 DMA: empty Oct 19 18:12:21 cust-db kernel: Node 1 Normal: 82*4kB 21*8kB 35*16kB 1*32kB 1*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 1*2048kB 1*4096kB = 7808kB
Oct 19 18:12:21 cust-db kernel: Node 1 HighMem: empty
Oct 19 18:12:21 cust-db kernel: Node 0 DMA: 0*4kB 0*8kB 0*16kB 1*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 32kB
Oct 19 18:12:21 cust-db kernel: Node 0 Normal: 238*4kB 31*8kB 10*16kB 2*32kB 15*64kB 6*128kB 1*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 8016kB
Oct 19 18:12:21 cust-db kernel: Node 0 HighMem: empty
Oct 19 18:12:21 cust-db kernel: Swap cache: add 1726051, delete 1726051, find 553282/678917, race 0+10
Oct 19 18:12:21 cust-db kernel: Out of Memory: Killed process 24211 (oracle).
Oct 19 18:15:27 cust-db proftpd[24474]: cust-db.ov.otto.de - Fatal: unable to open incoming connection:
Transport endpoint is not connected
Oct 19 18:19:45 cust-db syslogd 1.4.1: restart.
Hope this is a start point for discussion.
Thomas
--
.
- Follow-Ups:
- Re: spontaneous reboot
- From: John-Paul Stewart
- Re: spontaneous reboot
- Prev by Date: Re: CentOS or RHEL rpm's for Amarok?
- Next by Date: Re: What is going on with Open Office? So many BUGS!
- Previous by thread: CentOS or RHEL rpm's for Amarok?
- Next by thread: Re: spontaneous reboot
- Index(es):
Relevant Pages
|