Re: Hanging

From: Moe Trin (ibuprofin_at_painkiller.example.tld)
Date: 09/22/04


Date: Wed, 22 Sep 2004 15:15:24 -0500

In article <cirv1n$a85$1@ctb-nnrp2.saix.net>, Steven Hook wrote:
>This dial in box dies every second night or so and I don't know what's
>causing it, I think it might be a hardware issue but I really don't know
>for sure

Things to check - memory problems; find a copy of memtest86 on the net.
Temperatures inside the box. I take it you have determined this is not
a power problem, because no other computer is effected.

>and it really is a pain because although I set anothe machine up to watch
>it and mail me when it hangs I cant log in to it and see what's happening
>(at night from home) because it's hung.

What about in the morning when you get there? What display do you see? Is
the keyboard dead, or can you toggle the CapsLock, NumLock or ScrollLock
indication. What software is it running? X? or a text based login? If
X, does pressing the left 'Ctrl' and 'Alt' and F2 keys at the same time
get you to a login screen? Or, does left 'Ctrl' and 'Alt' and the back
space key kill X?

>is there a way fo recording what it was trying to do before it hung so
>when I get to it in the morning I can reboot it and then go check to see
>why it's hanging?

What I'd do is to run a script that takes a 'ps aux' snapshot, sticks the
results into a file (overwriting is probably best at this point, as the
file may otherwise get huge), then sleeps for sixty seconds and repeats.
Something like

   while [ true ] ; do
   ps aux > /root/ps_output1
   sleep 30
   ps aux > /root/ps_output2
   sleep 30
   done

and manually starting that before you leave at night. What you should have
next morning is two files showing the process table 30 seconds apart, just
before the system wedged. Starting the scripts manually avoids loosing the
information on re-boot.

        Old guy