Re: [sched-devel, patch-rfc] rework of "prioritize non-migratable tasks over migratable ones"



Hi Dmitry,
Just catching up on mail. I haven't had a chance to review your patch, so I am only responding to the comments in this thread.

On Wed, Jun 11, 2008 at 6:05 AM, in message
<b647ffbd0806110305p2be38608iadc1d2e91e3be57a@xxxxxxxxxxxxxx>, "Dmitry
Adamushko" <dmitry.adamushko@xxxxxxxxx> wrote:
2008/6/11 Peter Zijlstra <a.p.zijlstra@xxxxxxxxx>:
On Wed, 2008-06-11 at 00:58 +0200, Dmitry Adamushko wrote:
Hi Gregory,


regarding this commit: 45c01e824991b2dd0a332e19efc4901acb31209f


I think we can do it simpler. Please take a look at the patch below.

Instead of having 2 separate arrays (which is + ~800 bytes on x86_32 and
twice so on x86_64),
let's add "exclusive" (the ones that are bound to this CPU) tasks to the
head of the queue
and "shared" ones -- to the end.

In case of a few newly woken up "exclusive" tasks, they are 'stacked' (not
queued as now), meaning that
a task {i+1} is being placed in front of the previously woken up task {i}.
But I don't think that
this behavior may cause any realistic problems.

Doesn't this violate POSIX ?


If so, then the idea of "prioritize non-migratable tasks over
migratable ones" violates it, not just an artefact of this particular
implementation.

This isn't entirely accurate though. Your proposal does alter the behavior from the original patch. In fact, I put in the xqueue/squeue concept specifically to *avoid* the condition that you are creating. I think this was the point Peter was making.

However, to your point: I can agree that either implementation creates the possibility for creating strange scheduling artifacts. And I think the point you are making is that once we have the artifact, whats another one? ;)



No matter which implementation is used, we have a situation when a
woken-up single-CPU-bound task (let's call it 'p') can preempt a
current task with effects as follows:

- 'current' is not guaranteed to get another CPU;

Right. I admit this is one of the sticking points that has potential for creating artifacts. [1] The downside is there is no real way to avoid this. You can't atomically guarantee a position on another core until current has been preempted. So its always going to be a best-effort assessment of the state of the system prior to the preemption. Not ideal.

Fortunately, this is all w.r.t. equal prio tasks. And as you state below, if this was an issue the designer should have placed the tasks at different prios. This fact is what led me to ultimately feel somewhat comfortable with the concept. [2] A related issue is the potential to have an unbounded task be ping-ponged around (ehem) unbounded. ;)


- there might have been other pending tasks (of equal prio) on this
queue. As a result, 'p' starts running before them violating currently
used (explicitly requested by POSIX?) round-robin behavior.

Yes, except your patch also allows other bound tasks to preempt each other, which adds an additional artifact that is not present in mine nor is it strictly necessary. [3] You could probably work around this by having the enqueue_task_rt() seek out the first non-bound task position and insert there, instead of HEAD. In fact I had contemplated doing exactly that, but then decided the memory/speed tradeoff with the xqueue/squeue was better. I am not against going the other way if the memory cost of the xqueue is too great.


Anyway, the original (and my rework) algorithm is somewhat
sub-optimal. It considers only "p->rt.nr_cpus_allowed == 1", while the
ideal algorithm would behave as follows:

1) we get a newly woken-up task on CPU_i;

2) this task have a number (!= 1) of allowed CPUs (which is a sub-set
of all) _but_ all of them are busy at the moment serving high prio
tasks;

I don't understand the significance of the stipulation that its a "sub-set of all". Can you elaborate? Or did you mean to say "it *may* be a sub-set of all"?


3) there are no other pending tasks on the queue (of equal prio) that
are in the same condition as 'p' (that's they can't be migrated to
other CPUs at the moment and to wait for CPU_i is the 'best' effort
for them);

Note that this is already accounted for with xqueue/squeue, or with my proposed modifications [3] above


4) 'current' is guaranteed to start running immediatelly on another
CPU (apparently, its affinity allows CPUs that can be used neither by
'p', nor by tasks from (3) ), should we preempt it now.

As per [1], we unfortunately can never guarantee this.


5) then let 'p' preempt 'current'.

I think everything except your (4) is achieved in my patch, and could be in yours with the slight modification proposed in [3]. So the real question is whether the lack of your point (4) or the issue I mentioned in [2] is acceptable or not.


We do cause some 'troubles' (in a way of 'cache' efficienty) for the
'current' but probably we win CPU-time-distribution-wise.

True, but this in essence is what load-balancing is all about.


Note, hard real-time tasks would not like to find themselves in the
aforemnetioned situation. But then, one probably should have initially
taken care of assigning different prios or better off, isolating
dedicated CPUs for some tasks wrt how much CPU-time they
consume/latency they expect.

Right. The key is that bets are off when considering equal-prio tasks. Wake-up order, etc, can create similar concerns. That is what lets this concept be at least semi acceptable.



All in all, I think that the initial implementation is sub-optimal

Ouch :)

in
any case and it's not worth introducing additional overhead (another
priority queue array) to address this issue (which is kind of a corner
case).

We may just consider dropping this idea completely.
(my 0.02$)

And based on all this (and as I previously stated) I agree with your assessment that perhaps we should just drop this concept outright.

Regards,
-Greg


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: moving data from one place to another in a text file
    ... Create a queue per thread. ... > parser as a dealer distributing cards to the multiple players. ... will always want to overclock as many CPUs as they can supply Liquid ... Java may well support multi-threading, but that does not mean that it ...
    (comp.lang.cobol)
  • [PATCH sched-devel 7/7] cpuisol: Do not halt isolated CPUs with Stop Machine
    ... This patch makes "stop machine" ignore isolated CPUs. ... It addresses exact same usecase explained in the previous workqueue isolation patch. ...
    (Linux-Kernel)
  • Re: Undocumented and duplicated code
    ... see if duron support PAT. ... There must be some CPUs with the "pat" flag set but not being usable? ... places doesn't look good - this really deserved from the beginning being factored out into an own function to avoid future problems when CPUs get added (like what happens with your patch here - it touches only one place, and since the same context is present in two places in the same file "patch" might even choose freely where it gets applied...). ...
    (Linux-Kernel)
  • Re: [PATCH 1/2] boot: increase stack size for kernel boot loader decompressor
    ... I see you have applied the following patch to x86#for-akpm. ... I think you ment to use this one ... included but not your boot tweak. ... would you expect a real 4K CPUs system to boot any differently? ...
    (Linux-Kernel)
  • [RFC PATCH 0/4] timers: Timer migration
    ... This is a re-work of the earlier patch which i had sent. ... that such hrtimers are ignored during migration of timers. ... Here's a brief introduction as to why we need timer migration. ... idle cpus onto lesser idle cpus is necessary. ...
    (Linux-Kernel)