Scheduling latency summary

From: Rick Lindsley (ricklind_at_us.ibm.com)
Date: 09/30/03

  • Next message: Nick Piggin: "Re: [PATCH] Athlon Prefetch workaround for 2.6.0test6"
    To: linux-kernel@vger.kernel.org
    Date:	Mon, 29 Sep 2003 17:45:06 -0700
    
    

    I applied the schedstats patch to some recent releases and, with the help
    of Steve Pratt, ran some benchmarks. There's a lot of focus lately on
    improving interactivity, and to me that seems directly related to how fast
    a process can move from the run queue to the processor. For this summary,
    I'll call a "run slice" the period of time a task gets to run before it
    voluntarily OR involuntarily leaves the processor. "Latency" will be
    the time between entering a runqueue and actually landing on a processor.

    Using the schedstats patch, I took comparative measurements on -test5,
    -test5-mm3, -test6, and -test6-mm1. It's not only interesting to note
    whether the benchmark improved, but how the scheduler behavior changed
    (and differs between the different benchmarks).

    High latency would usually indicate congested runqueues. High runslices
    generally indicates workloads that were cpu-bound. Different benchmarks
    have different "normal" behavior, however. Although results were
    gathered, most benchmarks were run in an abbreviated manner to see trends
    and characteristics rather than run full out, fully tuned, to get valid
    test results.

    Graphs can be viewed at http://eaglet.rain.com/rick/linux/schedstats/graphs/

    Volanomark:
        test6-mm1 has, in general, about 15% higher latencies and about
        25% higher runslices than in test5. Volanomark is known to
        be pathological with regards to repeatedly and quickly calling
        sched_yield at times with some implementations of Java. The version
        I tested exhibits this spectacularly. What's interesting to note is
        that it appears we're both waiting a bit longer to do the spinning
        as well as taking a bit longer to do it in test6-mm1. Unlike most
        benchmarks, both run slices and latencies tend to live in the ns
        range, probably due to the rapid spinning. These test results
        declined in test6 by over 5%.

    SPECjbb
        As we move from small warehouses to larger warehouse
        runs we see us moving from low-latency/high-runslices to
        high-latency/low-runslices. Both test6 and test6-mm1 are showing
        about a 40% reduction in latency over test5, with only a slight
        reduction in runslice times (generally less than 5%). Not surprisingly,
        test6 showed slightly better results when under heavier load.

    SPECdets
        it's hard to see a pattern because the run utilized is generally short
        (under 5 minutes). More frequent samples of the scheduler statistics
        might help. In general, both test6 and test6-mm1 are comparable to
        test5 in terms of runslices and latencies. Runslices are very small,
        generally less than 3ms, indicating these tasks do not run very long
        before leaving the processor. Test results showed slight degradation
        at the low end but slight improvement at the high end.

    Kernbench
        We're all over the board, but basically no change. Both latencies and
        run slices tend to hover between 10 and 20 ms, suggesting moderate
        congestion but not major. This can change depending on what -j you
        run make at.

    Conclusion: test6 is generally as good as test5 unless you're running
    Volanomark -- then it's definitely worse.

    Rick
    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Nick Piggin: "Re: [PATCH] Athlon Prefetch workaround for 2.6.0test6"