Re: Linux box reboots some times without reasons

From: Unruh (unruh-spam_at_physics.ubc.ca)
Date: 10/28/05


Date: 28 Oct 2005 00:39:35 GMT

ibuprofin@painkiller.example.tld (Moe Trin) writes:

>In the Usenet newsgroup comp.os.linux.misc, in article
><djp6q8$8l3$1@nntp.itservices.ubc.ca>, Unruh wrote:

>>(Moe Trin) writes:

>>>One solution that might work is to change the above script to
>>
>>>while /bin/true
>>>do
>>>date>/var/tmp/ps1
>>>ps auxww >>/var/tmp/ps1
>>>sleep 10
>>>date>/var/tmp/ps2
>>>ps auxww >>/var/tmp/ps2
>>>sleep 10
>>>done
>>
>>Not sure at all why this would be better.

>You are not overwriting the "latest" data. With your original scheme,
>a "failure" (power, RAM, what-ever) occurring/detected between the
>'date' command, and the 'ps' results in /tmp/ps1 containing the date, and
>nothing else. Great for pin-pointing when the failure occurred, but not so
>great for troubleshooting.

True, although the chances that the crash occured in the millisec between
the date command and the ps command is not something I would worry about.
The problem of the cache not being written out is much more worrying.
I suppose if you ran this as root, you could always put a
sync
at the end of the write and before the "sleep 10" would at least get rid
of the kernel cache.

>>I suppose doing
>>ps auxww>>/var/tmp/ps1
>>cat /var/tmp/ps1>/dev/null
>>
>>would make sure that ps1 got flushed out because of the copy.

>Depends on the caching algorithm - that 'read' would likely _come_ from
>cache, but it wouldn't force the write all the way (kernel cache, the
>possible drive controller cache, and finally the cache on the disk).

> Old guy