Re: [PATCH] Fix longstanding load balancing bug in the scheduler.



On Thu, Sep 07, 2006 at 10:24:26AM -0700, Christoph Lameter wrote:
Hmmm... Some more comments

On Thu, 7 Sep 2006, Nick Piggin wrote:

So what I worry about with this approach is that it can really blow
out the latency of a balancing operation. Say you have N-1 CPUs with
lots of stuff locked on their runqueues.

Right but that situation is rare and the performance is certainly
better if the unpinned processes are not all running on the same
processor.

But the situation is rare full stop otherwise we'd have had more
complaints. And we do tend to care about theoretical max latency
in many other obscure paths than the scheduler.

How about if you have N/2 CPUs with lots of stuff on runqueues?
Then the other N/2 will each be scanning N/2+1 runqueues... that's
a lot of bus traffic going on which you probably don't want.


The solution I envisage is to do a "rotor" approach. For example
the last attempted CPU could be stored in the starving CPU's sd...
and it will subsequently try another one.

That wont work since the notion of "pinned" is relative to a cpu.
A process may be pinned to a group of processors. It will only appear to
be pinned for cpus outside that set of processors!

A sched-domain is per-CPU as well. Why doesn't it work?

What good does storing a processor number do if the processes can change
dynamically and so may the pinning of processors. We are dealing with
a set of processes. Each of those may be pinned to a set of processors.

Yes, it isn't going to be perfect, but it would at least work (which
it doesn't now) without introducing that latency.


I've been hot and cold on such an implementation for a while: on one
hand it is a real problem we have; OTOH I was hoping that the domain
balancing might be better generalised. But I increasingly don't
think we should let perfect stand in the way of good... ;)

I think we should fix the problem ASAP. Lets do this fix and then we can
think about other solutions. You already had lots of time to think about
the rotor.

The problem has been known about for many years. It doesn't remain
unfixed because we didn't know about this brute force patch ;)
Given that Ingo also maintains the -RT patchset, it might face a
rocky road... but if he is OK with it then OK by me for now.

This looks to me as a design flaw that would require either a major rework
of the scheduler or (my favorite) a delegation of difficult (and
computational complex and expensive) scheduler decisions to user space.

I'd be really happy to move this to userspace :)

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [patch] voluntary-preempt-2.6.9-rc1-bk4-Q7
    ... the main change in this patch are more SMP latency fixes. ... kernel, even with CONFIG_PREEMPT enabled, didnt have any spin-nicely ... CPUs in the system. ...
    (Linux-Kernel)
  • Re: [PATCH] Fix longstanding load balancing bug in the scheduler.
    ... out the latency of a balancing operation. ... be pinned for cpus outside that set of processors! ... I think we should fix the problem ASAP. ... computational complex and expensive) scheduler decisions to user space. ...
    (Linux-Kernel)
  • Re: Zahl gerade oder ungerade?
    ... für aktuelle intel Core2 CPUs ... AND/OR/XOR: ... Latency: 1 Throughput 0.33 ...
    (microsoft.public.de.german.entwickler.dotnet.vb)
  • Re: [PATCH][resubmit] x86: enable preemption in delay
    ... set the mask of allowed cpus to the current cpu only; ... Things like preempt latency and irq off latency are rather easy to ...
    (Linux-Kernel)
  • Re: [patch] latency tracer, 2.6.15-rc7
    ... >> This should help in UP configurations, or in SMP configurations where ... >> it to help in cases where one of several CPUs is frequently executing ... > took ~48 hours to show up in the latency tracer. ... most old-style server workloads wouldn't notice a 10ms hit to ...
    (Linux-Kernel)