Statistical methods for latency profiling
From: Lee Revell (rlrevell_at_joe-job.com)
Date: 07/31/04
- Previous message: Dmitry Torokhov: "Re: input system: EVIOCSABS(abs) ioctl disabled, why?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: jackit-devel <jackit-devel@lists.sourceforge.net> Date: Sat, 31 Jul 2004 01:22:37 -0400
Hey,
Recently Ingo Molnar asked in one of the voluntary-preempt threads for
the minimum and average scheduling delay reported by jackd. JACK does
not currently maintain these statistics.
I realized that the distribution of maximum latencies reported on each
process cycle is fairly normally distributed. If this data followed a
normal distribution, it becomes much easier to make generalizations
about it - the standard deviation becomes meaningful, we can identify
whether a change is statistically significant, etc. It would also
indicate the effectiveness of the Linux scheduler by identifying any
skew - the distribution in fact should be normal, as the only factor
that should be influencing this value is the variance in lengths of
whatever non-preemptible sections the kernel was in when we became
runnable.
The first step in being able to meaningfully analyze these results is to
determine the degree to which distribution is in fact normal. The first
step in this process is to just to look at a histogram of the data.
In any jackd engine, the allowable window for scheduler latency is:
(0, period_usecs/2)
A latency in the range:
[period_usecs/2, period_usecs]
will cause jackd to restart, and of course anything over period_usecs is
an XRUN.
Thus we can build a histogram of the observed latencies by creating an
array with period_usecs/2 elements, and adding 1 to each "bin" whenever
we see that value. Using 32 frames at 48KHz, we have 333 "bins".
The overhead is not too bad, because we only look at the highest latency
for each period (max_usecs), so we only have to update the histogram
once per period - if the underlying distribution is normal then the
distribution of the maximimums will be as well.
The ability to use statistical methods to generalize about this data
will be increasingly important moving forward, because this is the
*only* way we can provide hard numbers: confidence intervals, p-values,
standard deviations, etc. rather than "It feels faster" or "Since $FOO
change the system is sluggish". Statistical methods would allow us to
say "$FOO change increases the mean latency from 150 to 200 usecs, but
the standard deviation is now 20 rather than 75, so it's actually an
improvement".
Any change in the kernel that makes a difference in the "feel" of the
system is now open to analysis. Many developers apparently do not
consider a user report of "It feels sluggish" to be as much of a concern
as "X took Y amount of time now it takes Z". Statistical methods can
close this gap.
Anyone care to comment?
Lee
PS I did not do very well in my statistics classes, and it's possible
that many kernel hackers are completely unfamiliar with statistical
methods. Fortunately the subset that is immediately useful to us is
pretty easy to understand, and could be covered in one or two lectures,
or a HOWTO. Depending on the level of interest I can create one.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Previous message: Dmitry Torokhov: "Re: input system: EVIOCSABS(abs) ioctl disabled, why?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
- Re: Testing for product of Gaussians
... >I have a value created by a computer simulation. ... test against the computed
Bessel function distribution will ... I know the answer to the diffusion equation,
... are those of the Statistics Department or of Purdue University. ... (sci.math) - Re: Most valuable poster
... Ooh, the Poisson distribution! ... If you actually *knew* anything about statistics,
... probabilities I want, just like you do. ... (talk.origins) - Re: Fishers Exact Test and Chi-square
... The probability distribution implicitly assumes continuous rather than ... chi-square
or normal distribution would only give an approximation ... merely re-iterating that the
chi-square statistic for a 2x2 contingency ... there at the least dozes and dozens of chi-square
statistics. ... (sci.stat.consult) - Re: speed reading
... the existence of people twelve feet tall. ... use statistics in this
fashion to argue that people like this must ... show that for the real distribution of
heights, ... people read between 200 and 400 wpm" quoted in the one cite. ... (rec.arts.sf.written) - Re: FFT test with few kbits
... rules for statistics according to FFT. ... I personally don't care. ...
> empirical distribution of a statistic. ... > determine the number of simulations
that you need. ... (sci.crypt)