Re: percentage based CPU scheduling

From: Jean-David Beyer (jdbeyer_at_exit109.com)
Date: 08/08/04


Date: Sat, 07 Aug 2004 23:48:15 -0400

Juhan Leemet wrote (in part):

> I realize that benchmark design is tricky (lies, damn lies, statistics!).
> I've directed some successful benchmarking exercises in the past. In one
> case, we had inferred a CPU architecture difference between 2 models: the
> marketeering guys were livid/furious, and wanted to discredit our
> benchmarks, so they hauled in the firmware designer, who confirmed the
> difference (in that particular case it was barrel-shifter vs. none). I
> think benchmarks must be designed to abstract fundamental application
> operations, as opposed to some "generic" mine's bigger than yours measure.
>
Back when I was writing optimizer for C compiler output, my colleague and
I decided the benchmarks given us to optimize by our sales department were
ridiculous. We could get 10,000,000:1 speed improvements. The benchmarks
had appeared in Byte Magazine or some such place, and were something a
contributing editor dashed off on a slow weekend or something. Typically,
they did a few integer, or floating point, calculations in a loop that
went around 10,000x.

Well our optimizer did loop invariant code motion, noticed that the same
stuff was done each iteration of the loop, and moved it outside the loop.
The live-dead analysis noticed that the loop variable was not used, and
since there was no code in it, it removed the loop. It also noticed that
the results of the computation formerly inside the loop were not used, so
it removed them as well. So all those benchmarks ran in a little under a
microsecond on a 14 MHz processor.

We though it might be more useful to use real benchmarks. Now our computer
center was running an Amdahl mainframe but with UNIX as the operating
system, and they gathered a lot of statistics about each job because they
billed the users for CPU time, memory useage, etc. So we got a histogram
of the processes and resource consumption. We found that troff was the
most used process in terms of CPU time, so we picked that as a benchmark.
The whole thing, not just some inner loop. Eventually, we picked the top
10 programs as measured by the comp center over a month. Any improvement
in the execution of those programs would been worth real money. And to be
sure that the optimizer did not have too easy a time of it, we made the
programs actually do the IO (at least as far as the optimizer was
concerned. In fact, no IO took place, but the optimizer did not know that.
So the loop invarient code motion could not move much out of loops, since
the loop index (or something) would change the results each time around
the loop. Live-dead analysis could not remove huge chunks of code because
the computation results were actually used, etc. Now it is not easy to
build such benchmarks, and it is tough to optimize them, not because they
are so well written (most were lousy), but because they were so badly
written. Most of the problems were the excessive use of global variables
where they were not required. For various reasons, use of global variables
defeats optimizations that would help a lot.

In any case, sales department did not like our benchmarks and refused to
use them. They did not want troff to run 15% faster. They wanted to say
that the benchmarks ran faster on our hardware with our compiler than they
did on the competitors machines. But the benchmarks they wanted to use
made ours sound so impossibly great that no one would believe them. And
techically, our optimizer was that great _on those stupid benchmarks_ but
the benchmarks were totally unrepresentative of what the users needed to
know, i.e., for programs very much like they would actually run.

So rather than say: "lies, damn lies, statistics", I prefer instead,
"Figures don't lie, but liars sure can figure."

-- 
   .~.  Jean-David Beyer           Registered Linux User 85642.
   /V\                             Registered Machine   241939.
  /( )\ Shrewsbury, New Jersey     http://counter.li.org
  ^^-^^ 23:30:00 up 3 days, 15:05, 3 users, load average: 4.11, 4.12, 4.13

Loading