Re: measuring clock cycles per second



Rainer Weikusat wrote:

You already asserted this in your last posting. But this amounts to
'it works because I say it does'.

No, not really. I could've backed this up with statistical reasoning, Bernoulli trials come to mind, but thought that would be overkill. In fact, I am not trying to convince you, I'm trying to find a reason why you would think it's useless.

Trying a little though experiment:
Let's assume that a program only executes a single loop and this loop
calls invokes two subroutines. The average execution time of
subroutine #1 is 1/4 of the sampling interval, the average execution
time of #2 3/4. This means the program spends 1/4 of its time in
subroutine one, yet if the profiling timer was started 'close' to the
start of the loop, the instruction pointer should basically always be
somewhere in #2 when the value is recorded.

What's wrong with this example?

It's the assumption that what happens "on average" happens "in every instance" that you, it seems to me, made above.

Even if the average execution time of #1 is exactly 1/4 of the sampling interval and the average execution time of #2 is exactly 3/4 of the sampling interval, then, provided the standard deviation of the respective probability distributions of times is nonzero, the total time taken by the loop will, in general, be unequal to the sampling interval, even if by a mere 1%. Accumulation of those little differences will, as time progresses, quickly rid #2 of its preference for gathering sampling ticks.

You also seem to conveniently

a) forget that there are other schedulable entities which can, apparently stochastically, cause the program in question to be preemptied anywhere inside the loop, and as it runs again, the offset of the sampling timer will change things drastically, because of things like stalls, cache invalidation, page faults,...

b) assume an extraordinary view that _all_ loops that are timed are somehow exact multiples of sampling time (which, imho, follows from the adjective "useless" that you have used). Or was this example only to demonstrate something else?

So I tend towards "realizing the limits of profiling with a small
resolution", yet the claim that the technique is "useless except on
ancient hardware" eludes me.

The assumption that gprof actually provides useful output at some
point in the past is just a complimentary assumption of mine, because
I have never seen this happen ever since I first encountered the
program on a 25 Mhz processor.

My view is rather that "gprof actually provides useful output today, even with 100/s resolution". I see it happen once in a while when I look for bottlenecks in code that runs on 2.4 GHz processor. The function in which they occur, it seems I can isolate it pretty easily having gprof output. It's the one up there in the list, with >90% of time taken. You optimize it and bang, the running time is slashed. That's opposite of useless, happening every now and then. Rather than the poor resolution, I find the fact that measurement itself leads to changes in timing (via stalls, interference with cache, etc) more difficult to work with, especially when -g is used (not sure about the reason).

- J.
.



Relevant Pages

  • Re: measuring clock cycles per second
    ... executes a single loop and this loop calls invokes two ... the average execution time of #2 3/4. ... means the program spends 1/4 of its time in subroutine one, ... you would like to make for an unspoken-of reason. ...
    (comp.os.linux.development.apps)
  • Re: object system...
    ... but you shall not reason contradictory. ... while Halt loop ... Any finite halting problem is decidable, ...
    (comp.object)
  • Re: Which Is Better?
    ... The reason they gave was from a DBA perpspective like it was ... MERGE really necessary here - do you do any inserts in that loop, ... statistics for both approaches and compare them to see what could ... Did you capture any other statistics aside from run time ...
    (comp.databases.oracle.misc)
  • Re: Need Thread Advice
    ... The event object concept seems more reasonable. ... The code is written so that when a button is clicked, I open a com port ... but the release code hung in the whileloop. ... reason for this is volatile typically does not have the correct memory ...
    (microsoft.public.vc.mfc)
  • Re: excuse me if this is a real noddy question ...
    ... this is not the obvious reason to which I was alluding, but again, I could ... heap space is found and allocated for an object of size __sizeof ... made outside the loop, heap space is only allocated once, and the object is ... I would declare my pointer OUTSIDE of the for loop (for ...
    (microsoft.public.dotnet.languages.csharp)