Re: [RFC] Thread Migration Preemption - v4



On 07/14, Mathieu Desnoyers wrote:

* Oleg Nesterov (oleg@xxxxxxxxxx) wrote:
On 07/14, Mathieu Desnoyers wrote:

@@ -4891,10 +4948,42 @@ static int migration_thread(void *data)
list_del_init(head->next);

spin_unlock(&rq->lock);
- __migrate_task(req->task, cpu, req->dest_cpu);
+ migrated = __migrate_task(req->task, cpu, req->dest_cpu);
local_irq_enable();
-
- complete(&req->done);
+ if (!migrated) {
+ /*
+ * If the process has not been migrated, let it run
+ * until it reaches a migration_check() so it can
+ * wake us up.
+ */
+ spin_lock_irq(&rq->lock);
+ head = &rq->migration_queue;
+ list_add(&req->list, head);
+ if (req->task->se.on_rq
+ || !task_migrate_count(req->task)) {
+ /*
+ * The process is on the runqueue, it could
+ * exit its critical section at any moment,
+ * don't race with it and retry actively.
+ * Also, if the thread is not on the runqueue
+ * and has a zero migration count
+ * (__migrate_task failed because cpus allowed
+ * changed), just retry.
+ */
+ spin_unlock_irq(&rq->lock);
+ continue;

Again, this can deadlock. migration_thread() is SCHED_FIFO, and it shares the
same CPU with req->task. We are doing a busy-wait loop, req->task may have no
chance to finish its critical section.


If we share the CPU with the other thread, it means that it won't be on
the runqueue while we are holding the rq lock.

Why? The req->task could be runnable, but preempted by migration_thread().
In that case req->task->se.on_rq should be true.

I didn't read the new scheduler yet, but I belive on_rq == 0 only when
the task sleeps, it is like the current ->array = NULL. Please correct
me if I am wrong.

Oleg.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [RFC] Thread Migration Preemption - v4
    ... Oleg Nesterov wrote: ... Again, this can deadlock. ... migration_threadis SCHED_FIFO, and it shares the ... chance to finish its critical section. ...
    (Linux-Kernel)
  • Re: [RFC] Thread Migration Preemption - v4
    ... On 07/14, Mathieu Desnoyers wrote: ... Again, this can deadlock. ... migration_threadis SCHED_FIFO, and it shares the ... chance to finish its critical section. ...
    (Linux-Kernel)
  • Re: Andrea VM changes
    ... > kill, but it fixes the oom ... > practice it may look like a kernel deadlock despite syscalls returns ... The chance the well behaved app gets killed is big, ...
    (Linux-Kernel)
  • Re: Weird one for you all here.
    ... Is File and Print Sharing disabled by any chance? ... Is the Internet ... Can you browse shares on the ...
    (microsoft.public.windows.server.general)
  • Re: ContextSwitchDeadlock Problem
    ... "Kalpana" wrote in message ... Is there is any chance of having this problem due to the handles invoked ... Previoulsy i had a problem with Utilitylibrary.Rebar --> WndProc ... recursive call (deadlock). ...
    (microsoft.public.dotnet.languages.csharp)