Re: large binary immediately SEGV's

From: Jean-David Beyer (jdbeyer_at_exit109.com)
Date: 02/21/04


Date: Fri, 20 Feb 2004 23:23:31 -0500

Nick Landsberg wrote:

> Ah, good. Now we have another reason (as opposed to the plain
> statement by another poster that 91 MB was "idiotic").
>
> All good points, Jean-David.

Thank you.
>
> I did notice in the original post that, by far, the "text" segment
> was taking up the bulk of the space. This would be a qualitative
> indication that what you say about Murphy's law is true, but does not
> explain the SEGV the OP encountered BEFORE the entry to main().

True; I have not attempted to discover the problem in that 91MB hunk of
code. You could not pay me enough to even look. Talk about trying to
find a needle in a haystack... . 91MB is just way too big a haystack.
There are 9q processes in my process table right now (essentially idle),
and the biggest memory hog is Mozilla with 6 instances using 57
Megabytes RSS (each?) and share of 26756. I consider that excessive for
a single process, and the Mozilla group are probably sorry it got so big.

I suppose I would try to single step the program with gdb, but someone
(I suppose the OP) said even running ldd on the load module caused ldd
to crash, and that tells me that the load module is probably totally
scrozzled. ldd does not run the program after all, so bugs or no bugs,
it should be able to handle that program. Maybe its dynamic linking
information is broken in a way that ldd does not notice; in which case,
an MR should probably be filed against ldd. I should be able to give ldd
a garbage file, and not crash it (though I would expect a lot of insults
from ldd, of course). In fact, when I give it a garbage (jpeg file),
with mode set to 0770, it says:

valinux:jdbeyer[/usr/local/girls]$ ldd frammis
        not a dynamic executable

So the OP's file looks like a dynamic executable.

I remember a hardware engineer (computer repair person) who always
insisted in reducing any problem to a 4 assembler-level instruction
sequence or he assumed troubles were software problems. I had just
written an operating system (kernel in today's terminology) and when the
shell (using todays terminology) was supposed to type out the the
sequence "Ready: ", instead it typed "Rdady: ". Now this was not a
disaster in itself, but it really worried me since if I dumped the
memory, the text to be printed was clearly "Ready: ". So I copied the
code down to the instruction loop that did that, and it worked just
fine. Fortunately, I wrote the OS kernel in relocatable format (unusual
in those days because no memory management hardware), so I diddled the
loader to move the OS down one word in the memory space, and everything
worked. So I could give the hardware techie a 13 instruction sequence
that, _if executed in specific absolute hardware addresses_, did print
incorrectly as described. After that, it took him very little trouble to
replace a printed circuit card in the memory controller. (That whole OS
kernel took only 4096 instructions (including any data tables it
needed). No windowing system, no multiprogramming, no multiprocessing,
no modems, but it was pretty fast for what it did, and simple enough
that even my boss's boss could program and use it.)
>
> There are advocates of huge monolithic process structures on
> single-CPU machines who claim that it lets the process itself control
> the thread of execution, rather than rely on the vaguaries of the
> scheduler. It also avoids context switch overhead, as every message
> (whether it be SysV or Sockets) involves a system call, and thus a
> context switch into and out of kernel space.

Fine if they can run without an OS. Otherwise, they are going to have to
deal with an OS in any case. There will be a context switch everytime
the OS needs to do something anyway. Right now my SMP machine is doing
nothing other than running two instances of setiathome, a
compute-limited process, and this composer in Mozilla.

    procs swap io system cpu
  r b w si so bi bo in cs us sy id
  2 0 0 0 13 0 20 204 548 95 5 0
  2 0 0 0 0 0 5 138 334 93 7 0

Even if you think you are avoiding context switches, you probably are
not. Now you do not just walk into a design review and throw different
bubbles arbitrarily into different UNIX (or Linux) processes. You must
give each process a descriptive name that pretty much tells people what
the process does (but hides how it does it). Normally, you arrange each
process to do a lot of processing with little input or output to keep
message passing and context switching overhead low.
>
> In addition, if you have never worked with communicating processes
> (and the possible error conditions when messages get lost or
> delivered out of sequence) will usually get it wrong the first time
> (or even the second or third time).

I have done it a lot. I find systems designed that way are easier to
design, work better the first time, are easier to modify (because the
repercussions of implimentation detail changes do not propagate beyond
the process boundary unless you violate the interface specification, etc.

My favorite system came up with about 100 processes, but as more and
more users came on line, other processes were spawned dynamically. A
monolithic program would be unmanageable.

We arranged that the sequence of messages did not matter. Each process
was a finite state machine, and the messages could come in any order. If
the messages got lost, the responce expected by the sender timed out and
the sender could send it again, or do something else. The biggest
problem, IIRC, was when the input queues of processes filled up (we were
using System 5 IPC at the time), and it was a lot of bother to design
the system so that could not happen.

As it was designed, we could even change the implementation of running
processes and test them without taking the system down. Certain users
could run the new implimentation and the rest (most of them) got the
default implimentation. We just had to obey the interface specification,
and if we goofed that up, only our own stuff failed. This was a
wonderful way to test new algorithms and stuff on live data without
hurting the innocent.

> There are tradeoffs to any
> design. You have correctly mentioned some of the benefits of a
> loosely coupled design and have mentioned some of the benefits of the
> monolithic design. (I am NOT a proponent of the monolithic design,
> BTW, I am just playing devil's advocate.)

Understood.
>
> It's up to the OP and his organization to make the choice, and they
> seem to have settled on the monolithic, for better or for worse.
>
Yes, and by now, even if the OP tried to convince his organization to
change, they would argue that it is too late: they have too much
invested in what they already have. Reminds me of a motto I ascribed to
a former employer, though they HATED it: "We haven't time to stop for
gas; we're late already!"

A well, time to re-read Fred Brooks' book, "The Mythical Man Month",
again, I suppose.
>
> P.S. - I would tend to disagree that process creation is "cheap" on
> ANY O/S. If it is ONLY done on startup, then it's OK. If programs
> fork()/exec() others willy-nilly, it's real pig.
>
A pig compared to sqrt(integer) no doubt. But not compared to UNIX in
about 1970. It is much much better now, when we do paging instead of
process swapping, when the processors have memory management units at
their disposition, ... .

Well, for most operating systems, process creation is costly, when it is
allowed at all, and I have run on systems where you could not do it at
all. A privileged user had to introduce a new process to the system
before it could be run at all. But in Linux, fork|exec are pretty good
compared to the average OS. In the systems I worked on with dynamically
created processes, there tended to be less than a dozen created when a
new user logged into the system (typically a user did that at most once
a day). Recall that these days, if there are many instances of a process
(as was typical of our system), they all share the same instruction
space, that space is shared, so the memory consumption is reduced to
only the necessary data and stack space. I do not know if fork, these
days, notices that the parent and child are the same process, so
allocate only the data space or not. But exec should know that what it
is invoking is just another instance of a running program and need not
remanage the memory for redundant copies of the instruction space.

-- 
   .~.  Jean-David Beyer           Registered Linux User 85642.
   /V\                             Registered Machine    73926.
  /( )\ Shrewsbury, New Jersey     http://counter.li.org
  ^^-^^ 10:25pm up 45 days, 9:46, 2 users, load average: 2.20, 2.24, 2.14


Relevant Pages

  • Re: "Complete exercise" of x86 example?
    ... I would suspect that any current design with either the 8088 or 8086 ... required more CPU cycles than the memory fetch time. ... including *both* instruction fetches ... and data fetches and stores - not just the instruction bytes. ...
    (comp.lang.asm.x86)
  • Philosophical, Randomness and the Stack based processor.
    ... Forth has a problem with modern memory systems, as the memory systems are designed best to handle flows of sequential instructions with local looping, and least random access, not the small grained instruction flow to different Forth words. ... Of course we see that even in the latest custom stack silicon design this has been compensated by small address spaces and swapping, and along these lines, you can get the compiler to efficiently pack routines called next to the calling routine, in the same page. ... But ultimately the memory randomness problem has been getting worse and worse. ...
    (comp.lang.forth)
  • Re: [Lit.] Buffer overruns
    ... > floating point support or a memory expansion option. ... had virtual memory support grafted on. ... > where the modified instruction was fetched from. ... vis-a-vis the official coporate strategic operating system TSS/360. ...
    (sci.crypt)
  • Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
    ... :This doesn't contradict your claim since main memory is not really involved. ... that gives the same not-very-real-world cache state for all iterations ... full, and the cpu stalls anyway. ... static instruction order makes it easiest for them, ...
    (freebsd-arch)
  • Re: [PATCH] Mantaining turnstile aligned to 128 bytes in i386 CPUs
    ... :This doesn't contradict your claim since main memory is not really involved. ... that gives the same not-very-real-world cache state for all iterations ... full, and the cpu stalls anyway. ... static instruction order makes it easiest for them, ...
    (freebsd-current)

Loading