Unknown Server Failure, Logs and openntpd



Hi,

This morning one of our R&D servers stop responding (no ssh, http) and
because of urgency of some tests I needed to hardware-reset it. After
machine woke up, I first checked /var/log/messages:

May 30 06:25:05 arge syslogd 1.4.1#18: restart.
May 30 06:49:46 arge -- MARK --
May 30 07:09:46 arge -- MARK --
May 30 07:29:47 arge -- MARK --
May 30 07:49:47 arge -- MARK --
May 30 08:09:47 arge -- MARK --
May 30 08:29:47 arge -- MARK --
May 30 08:44:36 arge kernel: e100: eth1: e100_watchdog: link down
May 30 08:44:38 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex
May 30 08:44:40 arge kernel: e100: eth1: e100_watchdog: link down
May 30 08:44:42 arge kernel: e100: eth1: e100_watchdog: link up, 100Mbps, full-duplex
May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
May 30 08:38:11 arge syslogd 1.4.1#18: restart.
May 30 08:38:11 arge kernel: klogd 1.4.1#18, log source = /proc/kmsg started.
May 30 08:38:11 arge kernel: Linux version 2.6.18-6-686 (Debian 2.6.18.dfsg.1-18etch5) (dannf@xxxxxxxxxx) (gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)) #1 SMP Sat May 24 10:24:42 UTC 2008

As can be understood from "kernel: e100: eth1: ..." lines, I first
suspected a connection failure and try to fiddle with the network cable
socket. But logs tell that it wasn't the problem. Moreover, it seems
that system was working properly just before 08:44:36 if we'd look at
/var/log/syslog

May 30 08:40:01 arge /USR/SBIN/CRON[6611]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:40:01 arge /USR/SBIN/CRON[6614]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:41:01 arge /USR/SBIN/CRON[6630]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:41:01 arge /USR/SBIN/CRON[6632]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:42:01 arge /USR/SBIN/CRON[6654]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:42:01 arge /USR/SBIN/CRON[6655]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:43:01 arge /USR/SBIN/CRON[7039]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:43:01 arge /USR/SBIN/CRON[7040]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)
May 30 08:44:01 arge /USR/SBIN/CRON[7417]: (root) CMD (if [ -x /etc/munin/plugins/apt_all ]; then /etc/munin/plugins/apt_all update 7200 12 >/dev/null; elif [ -x /etc/munin/plugins/apt ]; then /etc/munin/plugins/apt update 7200 12 >/dev/null; fi)
May 30 08:44:01 arge /USR/SBIN/CRON[7420]: (munin) CMD (if [ -x /usr/bin/munin-cron ]; then /usr/bin/munin-cron; fi)

I checked logs of every file under /var/log at time between 08:00:00 and
08:38:00, but found nothing useful. OTOH, if we'd look at below lines of
the /var/log/messages output:

May 30 08:45:14 arge shutdown[7450]: shutting down for system halt
May 30 08:38:11 arge syslogd 1.4.1#18: restart.

It seems that openntpd somehow failed to synchronize hardware clock with
the time it gathered from NTP servers, and after reboot it switched back
to a past time. Is this something expected? If not, how can I fix this?

To summarize, what else should I check to figure out the reason of the
emerged problem? (I'll try to login from terminal next time such a
failure repeats.)


Regards.


--
To UNSUBSCRIBE, email to debian-user-REQUEST@xxxxxxxxxxxxxxxx
with a subject of "unsubscribe". Trouble? Contact listmaster@xxxxxxxxxxxxxxxx



Relevant Pages

  • Re: Disapearing DC
    ... "Paul Bergson" wrote: ... Please no e-mails, any questions should be posted in the NewsGroup ... But after restart all of this faild test are ok. ... Replace failing_dc_name with the name of the dc in this servers site. ...
    (microsoft.public.windows.server.active_directory)
  • Re: Windows Servers after Automatic Updates, Restart Problem
    ... implement a scheduled restart routine. ... I have been having reboot problems with Windows Servers after Automatic ... is to do a hard shutdown of the servers and power them back up. ... installation policy is applied by group policy and updates are deployed ...
    (microsoft.public.windows.server.general)
  • Re: Connection Error when Rebooting
    ... Most likely there is something running on your servers that is ... cancelling/stopping the shutdown. ... kernel mode driver or other core windows process could ... At the time you attempt to restart, is there any user logged on to ...
    (microsoft.public.windows.terminal_services)
  • Re: advise: howto config WSUS on production servers
    ... I didn't realize I was in the wrong group- a seach for "no restart" took me ... always feel like I am neglecting my Netware servers. ... > install the updates for you. ... > installing the updates, just so they can eliminate the possibility that the ...
    (microsoft.public.windowsupdate)
  • Re: Forced autodownload reboots - NOT LEGAL!
    ... > Servers I am responsible for have a forced reboot of them ... To complete the installation of the ... Restart Requirement ...
    (microsoft.public.security)