Re: Crash



Lawrence D'Oliveiro wrote:

Note to self: Do not try to import 7000 images into F-Spot.

My hard drive churned away for about an hour. At first the rest of my
desktop apps continued to be usable. Then they started getting slower.
And slower. Finally even the cursor found it too much work to keep up
with the mouse, and it would only move once or twice in a minute.

Then the hard drive stopped churning. At that point everything froze.
Ctrl-Alt-F1 wouldn't switch me to a text console. Even trying to SSH
from another machine wouldn't work. So I hit the reset button. My first
ever software-triggered Linux system crash.

Checking /var/log/messages after the reboot, I saw no sign of the
dreaded OOM killer. Just normal background activity, and then my reboot,
with nothing in-between. So it looked like it went straight from a
functioning system to being completely wedged.

Then I discovered that F-Spot is written in this Mono/C#/CLR thing,
which is one of those instant-software-bloat-just-add-water
technologies, like Java.

Currently I'm trying to use xzgv to try to bring some order to those
images. Its tagging function is useful, but having to type in the
complete path to the destination directory for a move every time is
getting a bit wearying. Also there doesn't seem to be an option to apply
an arbitrary shell command to the set of tagged images.

Any suggestions for another image-management app to try?

First things first. When your machine is busy (I'm assuming it wasn't
actually crashed) then the keystrokes would have eventually worked, as
would the ssh. But because the system is struggling for cpu time it appears
to have crashed.

A common problem is to pile emergency keystrokes on top of each other. So
when your system does get some time to work on the current system
interrupt, it gets another so is forced to go check that out too, then
another and another, each taking the system away from dealing with the
first. So you are building up these interrupt processes. The system now has
less time to deal with the first.

There is nothing in your error logs because as far as the system is
concerned it is still running. System utilities to watch for processes that
haven't responded in some time are also fighting for cpu time.

When you are doing a large batch run like this, that as you now know carries
some risk, if you can do smaller batches thats great, better still to loop
in batches in a script then your system is guarenteed some time between
batches to catch up, but if not then increase the 'nice' value of the
process, you want it high enough that other computer functions take
priority. Most of the items in your 'top' will have a nice=0. High priority
systems will be around -5 some low priority shell utilities might be around
+5 or even higher.

If you look at 'top' you will see that your shell interrupts take the same
priority as applications. This is usually fine provided the emergency
keystrokes we use are responded to by the kernel because that gets the
keystrokes first, if we have chosen a combination that the shell has to
respond to, then our interrupt has the same priority as the offending
program. So will be very much slower to respond.

Example nice command,

nice -n 10 gimp MyPics*.jpg

If nothing else is running then your picture tagging software will take a
speed as if it had the system to itself, i.e. there is nothing else asking
for cpu time other than basic system services, but if you need to call a
halt to it then your shell and system have much higher priority than the
tagging task so will respond more quickly.

So this is your mission.

1st, prove to yourself that what I have said is true. Set your remote ssh
running a 'top'.

Run at least one cpu busy process, if you can't think what to run and try
some script loop, remember that we don't want an actual crash (your system
will notice the loop isn't going anywhere), so make sure your loop comes
out every couple of seconds before diving in again.

Watch it in the top. As a nice=0 it will steal much cpu time because it is
rarely releasing it's hold. It is still subject to slice-time allocation,
but is like a demanding kid, if he screams long enough you give him the
whole bag of toffees just to shut him up.

Kill it and this time do the same but with the command,

nice -n 19 yourscript

This time you will see that your loop rarely gets up to the top three even
in an otherwise idle seeming system. With your X and top updates taking
priority.

Now if you were to do one of the shell key combinations, the response will
be very quick, not necessarily instant, because a soft kill always gives
the application a chance to give itself up.

2nd (did you forget we were in a list? I know I did). Put the emergency key
combinations into a spread sheet, mark which are shell and which are higher
priority kernel. Also a quick note by each of when this particular
combination is most likely needed. Print it out and put it handy.

Then on a seeming crash, select one and only one of those combinations,
press it just the once. Then go and make a cup of tea, a strong one. Put a
couple of Gingernuts onto a plate. Come back and eat the Gingernuts,
dipping them in the tea of cause. Finish the cup of tea.

If the system hasn't responded to your keys you can now try the next most
appropriate combination. In this case though ensure that it is one of those
that the kernel responds to. Now go and make a cup of tea ... Gingernuts
.... dip ... etc.

With the luxury of ssh'ing into your box, that takes over as the second
option, expect very slow responses, unless you have preprepared the system
with an ssh terminal that is very high priority, nice -n -15. Though you do
get to eat less Gingernuts.



.



Relevant Pages

  • Re: SuSE freezes after a while if wlan0 is up
    ... that your shell keys are likely to respond more quickly. ... Lower priority is the higher number. ... anything goes wrong with the comms they going to be in trouble (not serious ... Is SuSE writing any log files or memory dumps, ...
    (alt.os.linux.suse)
  • Re: MVS 4 minute outage
    ... I would suggest that some very high priority task got in a loop - something that runs at dispatching priority x'FF'. ... By 'outage', I mean we could not communicate with MVS through TSO or the ... Search the archives at http://bama.ua.edu/archives/ibm-main.html ...
    (bit.listserv.ibm-main)
  • Re: Wiseman and McGhie are Ranting Again
    ... of which is the clock tick that expires a time slice interval. ... > Then we can start playing around with Thread Priority. ... I think you are comparing pre-emptive and co-operative scheduling. ... then falls quiet and calls its Idle Loop. ...
    (microsoft.public.mac.office.word)
  • Re: Can someone explain how women work?
    ... VERY TONGUE IN CHEEK RESPONSE (Too true still, ... The other dimensions that have to be added are her priorities. ... original graph impacted positively or negatively the chosen priority. ...
    (sci.physics)
  • Re: Book on Pre-MATH for cryptography and cryptanalysis. Reply
    ... Tomorrow Afif will elect the east, ... aids it too, the priority will condemn in response to the near ... If you will park Rosalind's hunting in response to dangers, ...
    (sci.crypt)