Re: [patch 4/7] sched: Change nohz ilb logic from pull to push model



On Tue, 2010-06-01 at 16:47 -0700, Vaidyanathan Srinivasan wrote:
* Suresh Siddha <suresh.b.siddha@xxxxxxxxx> [2010-05-17 11:27:30]:

From: Venkatesh Pallipadi <venki@xxxxxxxxxx>
Subject: sched: Change nohz ilb logic from pull to push model

In the new push model, all idle CPUs indeed go into nohz mode. There is
still the concept of idle load balancer. Busy CPU kicks the nohz
balancer when any of the nohz CPUs need idle load balancing.
The kickee CPU does the idle load balancing on behalf of all idle CPUs
instead of the normal idle balance.

This addresses the below two problems with the current nohz ilb logic:
* the balancer will continue to have periodic ticks and wakeup
frequently, even though it may not have any rebalancing to do on
behalf of any of the idle CPUs.
* On x86 and CPUs that have APIC timer stoppage on idle CPUs, this
periodic wakeup can result in an additional interrupt on a CPU
doing the timer broadcast.

How do we select the timer broadcast cpu today? Is it changes at all
at run time? Maybe that CPU should be a good target for the timer
migration so that additional CPU are not wokenup from idle states.

It is based on who handles irq 0. Anyways, newer generation of cpu's
doesn't have apic timer stoppage issue. If you want to do more
intelligent timer migrations, then its better to address with a
mechanism that works efficiently for all platforms.

Can you please give more explanation on how the combination of
first_pick_cpu and second_pick_cpu works. We need to kick an idle CPU
whenever our cpu or group becomes overloaded right.

With this change, the need for idle load balancing(/kicking an idle cpu
to do idle load balancing on behalf of all idle cpus) is determined when
there is only one cpu busy with more than 1 task or more than one cpu
busy.

sched group is relevant only if we are balancing at a particular domain.
Traversing all the groups and finding out the group load vs capacity
from the busy cpu will add more overhead to the busy cpu.

We need more intelligence of when to do idle load balancing and when to
stop it. Current proposal is a simple fix to address the idle core in a
semi-idle laptop/netbooks to not have periodic ticks in idle.

We will have to
prefer cores of other packages when more tasks become ready to run.
So a notion of group overload is needed to kick new cpus. Also new
idle CPUs that are kicked should come form other cores and packages
instead or nearest sibling.

Kicked cpu can be nearest idle core to the busy core. This can do idle
load balancing and the actual load can move the far away core (for perf
policy) or nearest core (for power-savings policy). Any intelligent
heuristics to do this with minimal disturbance to busy cpu's are
welcome.


As per the current implementation a sibling thread may be kicked but
it will not pull the task as the load balancer will be run on behalf
of all idle cores in the system and then a appropriate idle core will
pull the new task... correct?

Yes.

thanks,
suresh

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [patch 2/2] sched: fix select_idle_sibling() logic in select_task_rq_fair()
    ... the task is currently woken-up and the idle sibling that we found) in our ... domainthat spans the cpu that the task currently ... compare cost/benefit of using an idle shared cache core. ... case of this benchmark proggy, that gain is a _lot_, same for the TCP ...
    (Linux-Kernel)
  • Re: CAM disk I/O starvation
    ... the basic idea is to look at the CPU state counters (they are an ordered ... sysctl OIDs idle kern.cp_time for the aggregate of all CPUs; ... kern.cp_times for an array of them, one quintuple per core. ... the Perl script with a shell script. ...
    (freebsd-hackers)
  • [GIT PULL] Scheduler updates for v2.6.36
    ... adjust when cpu_active and cpuset configurations are updated during cpu on/offlining ... Change nohz idle load balancing logic to push model ... static inline int cpuset_init ... * In the semi idle case, use the nearest busy cpu for migrating timers ...
    (Linux-Kernel)
  • Re: [patch 0/7] sched: change nohz idle load balancing logic to push model
    ... all idle CPUs indeed go into nohz mode. ... The kickee CPU does the idle load balancing on behalf of all idle CPUs ... Also currently we are migrating the unpinned timers from an idle to the cpu ...
    (Linux-Kernel)
  • Re: [RFC] (How to) Let idle CPUs sleep
    ... turns out that if we restrict the amount of time idle cpus are ... cpu sleeps. ... * local timer ticks. ... +int idle_balance_retry ...
    (Linux-Kernel)