[PATCH] Staircase cpu scheduler 2.6.8-rc2-mm1

From: Con Kolivas (kernel_at_kolivas.org)
Date: 07/31/04

  • Next message: David Brownell: "Re: Solving suspend-level confusion"
    Date:	Sun, 01 Aug 2004 00:20:04 +1000
    To: linux kernel mailing list <linux-kernel@vger.kernel.org>
    
    
    
    

    This is a rewrite of the scheduler policy for 2.6, based on the current
    O(1) scheduler. It removes a lot of complexity and replaces it with an
    obvious and simple design

    Aims:

       - Simple predictable design.
       - Good scalability.
       - Interactive by design rather than have complicated interactivity
         heuristics
       - Maintain fairer cpu distribution with load
       - Low scheduling latency for SCHED_NORMAL tasks.
       - Low overhead.
       - Making renicing processes actually matter for CPU distribution
         (nice 0 gets 19 times what nice +19 gets)
       - Easily configured for different workloads (compute server,
         non-interactive)

    Summary of design:
       A descending foreground-background single runqueue per cpu priority
       array

    Details:
       This patch takes advantage of the existing infrastructure but has no
       expired array. Real time tasks are treated simiilary as the current
       fixed priority & timeslice system. The details are in the management
       of normal policy (SCHED_NORMAL) tasks.

       Normal policy tasks have a dynamic priority that drops by one every
       RR_INTERVAL which equals 10ms. Once they drop to lowest priority they
       are requeued at their top dynamic priority again.

       Every time a task sleeps it gains "burst" which allows them to stay at
       their highest priority for an extra interval per burst before dropping
       priority.

       The sub-jiffy case is handled specially to prevent it fooling this
       system, by continuing their current timeslice next wakeup. Tasks that
       have just forked are also treated the same way.

       Yield()ing is implemented by putting tasks to the end of the lowest
       priority queue, but once scheduled the task wakes up to it's normal
       first wake up priority.

    Use a fixed font to see clearly:

    ie
    RR_INTERVALs nice 0 task
    20<-------------->40 PRI
    11111111111111111111
    21111111111111111110
    31111111111111111100
    and so on

    nice +10 task
    30<---->40 PRI
    1111111111
    2111111110
    3111111100
    and so on

       This is how cpu distribution is kept proportional while optimising
       latency for interactive tasks (which wake frequently).

    Features:
       This scheduler has coded in 2 simple sysctl configuration options.

    /proc/sys/kernel/interactive

       is a sysctl which when set to 0 disables the burst mechanism entirely
       making all tasks descend the staircase in a more rigid manner and thus
       maintaining much stricter cpu distribution (ie suitable to multi-user
       setups, servers and non gui machines.) Since the design remains
       foreground-background descending interactivity is maintained but
       response becomes slower/proportional to load.

    /proc/sys/kernel/compute

       is a sysctl which puts the cpu scheduler in "computational" mode which
       is designed for optimal cpu throughput at the expense of latency. It
       enlarges timeslices to 10 times larger than default and introduces
       delayed preemption of normal tasks to optimise cpu cache utilisation.
       This mode effectively sets interactive to 0 as well.

    With thanks to William Lee Irwin III and Zwane Mwaikambo for patches and
    debugging and Peter Williams for useful suggestions.

    Patch against 2.6.8-rc2-mm1
    Signed-off-by: Con Kolivas <kernel@kolivas.org>

      fs/proc/array.c | 4
      include/linux/sched.h | 15
      include/linux/sysctl.h | 2
      init/main.c | 1
      kernel/exit.c | 1
      kernel/sched.c | 865
    ++++++++++++++-----------------------------------
      kernel/sysctl.c | 16
      7 files changed, 283 insertions(+), 621 deletions(-)

    
    

    
    

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/




  • Next message: David Brownell: "Re: Solving suspend-level confusion"

    Relevant Pages