[PATCH] sched: improve smpnice load balancing when load per task imbalanced



Problem:

2 CPU system: if the cpu-0 has two high priority and cpu-1 has one normal priority task, how can the current code detect this imbalance because imbalance will be always < busiest_load_per_task and max_load - this_load will be < 2 * busiest_load_per_task and pwr_move will be <= pwr_now.

Solution:

Modify the assessment of small imbalances to take into account the relative sizes of busiest_load_per_task and this_load_per_task. This is exploiting the fact that if the difference between the loads is greater than busiest_load_per_task and busiest_load_per_task is greater than this_load_per_task then moving busiest_load_per_task worth of load from busiest to this will be an improvement in the distribution of weighted load.

Required patches:

sched-prevent-high-load-weight-tasks-suppressing-balancing.patch
sched-improve-stability-of-smpnice-load-balancing.patch

Note: This patch makes no change to load balancing in the case where all tasks are nice==0.

Signed-off-by: Peter Williams <pwil3058@xxxxxxxxxxxxxx>

--
Peter Williams pwil3058@xxxxxxxxxxxxxx

"Learning, n. The kind of ignorance distinguishing the studious."
-- Ambrose Bierce

Index: MM-2.6.X/kernel/sched.c
===================================================================
--- MM-2.6.X.orig/kernel/sched.c 2006-04-04 15:18:19.000000000 +1000
+++ MM-2.6.X/kernel/sched.c 2006-04-04 15:53:48.000000000 +1000
@@ -2252,8 +2252,16 @@ find_busiest_group(struct sched_domain *
if (*imbalance < busiest_load_per_task) {
unsigned long pwr_now = 0, pwr_move = 0;
unsigned long tmp;
+ unsigned int imbn = 2;

- if (max_load - this_load >= busiest_load_per_task*2) {
+ if (this_nr_running) {
+ this_load_per_task /= this_nr_running;
+ if (busiest_load_per_task > this_load_per_task)
+ imbn = 1;
+ } else
+ this_load_per_task = SCHED_LOAD_SCALE;
+
+ if (max_load - this_load >= busiest_load_per_task * imbn) {
*imbalance = busiest_load_per_task;
return busiest;
}
@@ -2266,10 +2274,6 @@ find_busiest_group(struct sched_domain *

pwr_now += busiest->cpu_power *
min(busiest_load_per_task, max_load);
- if (this_nr_running)
- this_load_per_task /= this_nr_running;
- else
- this_load_per_task = SCHED_LOAD_SCALE;
pwr_now += this->cpu_power *
min(this_load_per_task, this_load);
pwr_now /= SCHED_LOAD_SCALE;


Relevant Pages

  • Re: [PATCH] sched: improve smpnice load balancing when load per task imbalanced
    ... normal priority task, how can the current code detect this imbalance ... is exploiting the fact that if the difference between the loads is ... distribution of weighted load. ...
    (Linux-Kernel)
  • [PATCH] sched: Simplify move_tasks()
    ... This multiplexing of sched_taskwas introduced, by me, as part of the smpnice patches and was motivated by the fact that the alternative, one function to move specified load and one to move a single task, would have led to two functions of roughly the same complexity as the old move_tasks). ... int *all_pinned) ... tasks if there is an imbalance. ... struct sched_group *group; ...
    (Linux-Kernel)
  • Re: RT task scheduling
    ... Please see my previous post "realtime-preempt scheduling - rt_overload behavior" for a testcase that produces unpredictable scheduling results. ... Do we need to strive for "strict realtime priority scheduling" where the NR_CPUS highest priority runnable SCHED_FIFO tasks are _always_ running? ... Or do we take the best effort approach with an upper limit RT priority imbalances, where an imbalance may occur but will be remedied within 1 tick. ... The basic idea is that as well as trying to equally distribute the weighted load among the groups/queues we should also try to achieve equal "average load per task" for each group/queue. ...
    (Linux-Kernel)
  • Re: allow the load to grow upto its cpu_power (was Re: [Patch] dont kick ALB in the presence of pinn
    ... > I have a variation on the 2nd part of your patch which I think ... IMO it kind of generalises the current imbalance ... groups picking up that load. ... send the line "unsubscribe linux-kernel" in ...
    (Linux-Kernel)