Re: [patch] CFS scheduler, -v12




* William Lee Irwin III <wli@xxxxxxxxxxxxxx> wrote:

cfs should probably consider aggregate lag as opposed to aggregate
weighted load. Mainline's convergence to proper CPU bandwidth
distributions on SMP (e.g. N+1 tasks of equal nice on N cpus) is
incredibly slow and probably also fragile in the presence of arrivals
and departures partly because of this. [...]

hm, have you actually tested CFS before coming to this conclusion?

CFS is fair even on SMP. Consider for example the worst-case
3-tasks-on-2-CPUs workload on a 2-CPU box:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2658 mingo 20 0 1580 248 200 R 67 0.0 0:56.30 loop
2656 mingo 20 0 1580 252 200 R 66 0.0 0:55.55 loop
2657 mingo 20 0 1576 248 200 R 66 0.0 0:55.24 loop

66% of CPU time for each task. The 'TIME+' column shows a 2% spread
between the slowest and the fastest loop after just 1 minute of runtime
(and the spread gets narrower with time). Mainline does a 50% / 50% /
100% split:

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3121 mingo 25 0 1584 252 204 R 100 0.0 0:13.11 loop
3120 mingo 25 0 1584 256 204 R 50 0.0 0:06.68 loop
3119 mingo 25 0 1584 252 204 R 50 0.0 0:06.64 loop

and i fixed that in CFS.

or consider a sleepy workload like massive_intr, 3-tasks-on-2-CPUs:

europe:~> head -1 /proc/interrupts
CPU0 CPU1

europe:~> ./massive_intr 3 10
002623 00000722
002621 00000720
002622 00000721

Or a 5-tasks-on-2-CPS workload:

europe:~> ./massive_intr 5 50
002649 00002519
002653 00002492
002651 00002478
002652 00002510
002650 00002478

that's around 1% of spread.

load-balancing is a performance vs. fairness tradeoff so we wont be able
to make it precisely fair because that's hideously expensive on SMP
(barring someone showing a working patch of course) - but in CFS i got
quite close to having it very fair in practice.

[...] Tong Li's DWRR repairs the deficit in mainline by synchronizing
epochs or otherwise bounding epoch dispersion. This doesn't directly
translate to cfs. In cfs cpu should probably try to figure out if its
aggregate lag (e.g. via minimax) is above or below average, and push
to or pull from the other half accordingly.

i'd first like to see a demonstration of a problem to solve, before
thinking about more complex solutions ;-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [patch] CFS scheduler, -v12
    ... CFS is fair even on SMP. ... PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND ... CPU bandwidth, albeit incredibly slowly. ...
    (Linux-Kernel)
  • Re: [REPORT] cfs-v4 vs sd-0.44
    ... consume any CPU time it can get. ... Which still doesn't justify an elaborate "points" sharing scheme. ... CFS there's a max of 1000-1500 context switches a second at nice -10. ... hm, that would be a bad idea under any scheduler, ...
    (Linux-Kernel)
  • Re: [PATCH 1/1] New documentation about CFS.
    ... Rewrite of the CFS documentation - because the old one was sorely ... +scheduler implemented by Ingo Molnar and merged in Linux 2.6.23. ... +timestamp and measure the "expected CPU time" a task should have gotten. ...
    (Linux-Kernel)
  • Re: fair clock use in CFS
    ... CFS and thought I would ask the experts to avoid wrong guesses! ... advances at a pace inversely proportional to the load on the runqueue. ... Why can't fair clock be same as wall clock at all times? ... tasks contending for the CPU is a different scenario from when there is ...
    (Linux-Kernel)
  • [PATCH 0/2] Add group awareness to CFS - v2
    ... The basic idea is to reuse CFS core and other pieces of scheduler like ... Currently all groups get "equal" cpu bandwidth. ... ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ctxsw ...
    (Linux-Kernel)