Diagnosing occassional random reboots
- From: Dougie Nisbet <dougie@xxxxxxxxxxxxxx>
- Date: Tue, 31 Oct 2006 17:29:29 +0000
A server which has been running steadily for years is beginning to reboot. To the best of my knowledge, nothing has changed. It is a dual-processor PIII. It runs stable.
It is tucked away in the loft and usually has no monitor attached so tracking this down is difficult. However even if I brought it into a more convenient area, short of sitting staring at the screen waiting for a crash or reboot, I'm not sure it would help much.
I've tried rebuilding a newer kernel from backports.org. And trimmed it right down as much as possible. There is nothing useful in syslog. A typical series of reboots looks like:
dougie pts/0 tbird2xp:0.0 Tue Oct 31 17:15 still logged in
runlevel (to lvl 2) 2.6.17 Tue Oct 31 17:12 - 17:21 (00:08)
reboot system boot 2.6.17 Tue Oct 31 17:12 (00:08)
dougie pts/0 tbird2xp:0.0 Tue Oct 31 17:09 - crash (00:02)
runlevel (to lvl 2) 2.6.17 Tue Oct 31 16:59 - 17:12 (00:12)
reboot system boot 2.6.17 Tue Oct 31 16:59 (00:21)
dougie pts/0 tbird2xp:0.0 Tue Oct 31 16:05 - crash (00:54)
runlevel (to lvl 2) 2.6.17 Tue Oct 31 15:16 - 16:59 (01:43)
reboot system boot 2.6.17 Tue Oct 31 15:16 (02:04)
date new time Sun Oct 29 07:11
date old time Sun Oct 29 07:12
root pts/3 kitchens Sun Oct 29 07:11 - crash (2+08:04)
dougie pts/2 kitchens Sat Oct 28 20:29 - crash (2+19:46)
dougie pts/1 kitchens Sat Oct 28 11:37 - 16:04 (1+05:27)
dougie pts/0 tbird2xp:0.0 Fri Oct 27 13:16 - crash (4+03:00)
And the syslog shows nothing notable around the time. Usuall just lines from postfix as it processes the mail queue, then:
Oct 31 17:12:22 nick syslogd 1.4.1#17: restart (remote reception).
Oct 31 17:12:22 nick kernel: klogd 1.4.1#17, log source = /proc/kmsg started.
Oct 31 17:12:23 nick kernel: Inspecting /boot/System.map-2.6.17
Oct 31 17:12:23 nick kernel: Loaded 21314 symbols from /boot/System.map-2.6.17.
I'm not sure how to go about tracking this down. My searching of the archives shows that these symptoms could describe a faulty physical component, such as memory or PSU. So my next step is probably going to be trying to swap the PSU and doing a memtest. One thing about the reboots is that they often appear to be in clusters. For example, around 7AM to 9AM on Oct 24 it looks like it was bouncing for about two hours off and on:
# last reboot
reboot system boot 2.6.8 Wed Oct 25 05:03 (06:50)
reboot system boot 2.6.8 Wed Oct 25 04:31 (07:22)
reboot system boot 2.6.8 Tue Oct 24 11:09 (1+00:44)
reboot system boot 2.6.8 Tue Oct 24 10:59 (00:06)
reboot system boot 2.6.8 Tue Oct 24 09:52 (01:01)
reboot system boot 2.6.8 Tue Oct 24 09:50 (01:03)
reboot system boot 2.6.8 Tue Oct 24 09:49 (01:05)
reboot system boot 2.6.8 Tue Oct 24 09:37 (01:17)
reboot system boot 2.6.8 Tue Oct 24 09:05 (01:49)
reboot system boot 2.6.8 Tue Oct 24 08:53 (02:00)
reboot system boot 2.6.8 Tue Oct 24 08:51 (02:03)
reboot system boot 2.6.8 Tue Oct 24 07:28 (03:26)
reboot system boot 2.6.8 Tue Oct 24 07:26 (03:27)
reboot system boot 2.6.8 Tue Oct 24 07:24 (03:29)
reboot system boot 2.6.8 Tue Oct 24 07:01 (03:52)
reboot system boot 2.6.8 Tue Oct 24 06:18 (04:36)
I'm a bit stumped on how to solve this and would appreciate any thoughts on strategy.
Dougie
--
To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx with a subject of "unsubscribe". Trouble? Contact listmaster@xxxxxxxxxxxxxxxx
- Prev by Date: Re: reading MS word files
- Next by Date: Re: Progrees meter in scp
- Previous by thread: TRANSFERING OF $55.2 MILLON.... CONTACT ME IF INTRESTED.
- Next by thread: GPL Parental Controls
- Index(es):
Relevant Pages
|
|