Re: OSDL Bug 3770

From: Nick Piggin (piggin_at_cyberone.com.au)
Date: 12/18/04

  • Next message: Charles-Henri Collin: "ip=dhcp problem..."
    Date:	Sat, 18 Dec 2004 20:43:38 +1100
    To: Loic Domaigne <loicWorks@gmx.net>
    
    

    Loic Domaigne wrote:

    > Hello Nick!
    > Hello NPTL Mailing List!
    >

    Hello Loic! Thanks for the interesting mail.

    I'm CCing lkml and Ingo with this, because I wouldn't feel comfortable
    to veto
    this myself.

    lkml: We're discussing the fact that on SMP machines, our realtime
    scheduling
    policies are per-CPU only. This caused a problem where a high priority
    task on
    one CPU caused all lower priority tasks on that CPU to be starved, while
    tasks
    on another CPU with the same low priority were able to run.

    >>> Ah, the problem is that when the driver thread has a higher
    >>> priority than the worker threads, so when the driver goes into an
    >>> infinite loop waiting, the able to schedule, however.
    >>
    >
    > Although POSIX legally permits such implementation for realtime policy
    > on SMP machines, this implementation is clearly *NOT* REASONABLE.
    >

    Well I haven't done much in the realtime area... but nobody has
    complained till now.

    > The reason is extremely simple: the application *CANNOT* necessarily
    > known that it gets stuck behind a higher-priority thread (though it
    > could had run on another CPU if the scheduler had decided otherwise).
    > That's *NOT* doable to program in a deterministic fashion in such
    > "realtime"-environement
    >

    You could use CPU binding. I'd argue that this may be nearly a
    requirement for
    any realtime system of significant complexity on an SMP system.

    *But*, notice that the program in question did not run on UP and
    randomly fail
    on SMP, rather it would not work on single processor AT ALL.

    > [
    > "Realtime" put into quote. I am speaking here of soft realtime, that
    > is an environment whose tasks scheduling follow a specific
    > deterministic order. I am not speaking about hard-realtime that have
    > additional timing constraints. Following that definition, we can say
    > that Linux offers (soft) "Realtime".
    > ]
    >
    >
    >> > The driver really needs to sleep, use a mutex, use a lower priority,
    >>
    >>> or something in order for it to work.
    >>
    >
    > NO! It is not the responsability of the application to fix that
    > behavior! We can in our case because 'we know', but some applications
    > don't!!!
    >

    That's a bit hand-wavy ;) but I don't dismiss it out of hand because as
    I said,
    I'm not so familiar with this area. I would be interested in an example
    of some
    application where this matters, and which absolutely can't use any
    synchronisation
    primitives.

    >
    > The mistake done here is interesting. When you have a pool of servers,
    > you can proceed in two ways to serve the clients:
    >
    > (1) make a FIFO queue for each server. When a client arrives, it
    > chooses the queue that is the shortest.
    >
    > (2) make an unique FIFO queue for all servers. All clients are
    > queued, and when a server is done it takes the first client
    > waiting on that big queue.
    >
    > Queuing theory proves that (2) is better. Exactly due to the reason we
    > have here. With (1), the guys in the queue might get stuck if the
    > corresponding server is blocked by a client. With (2), when a server
    > is blocked by a client, it doesn't prevent the other clients to be
    > served by other servers.
    >

    But that model is flawed for SMP scheduling. If it were that easy, we
    might have a
    single queue for _all_ tasks.

    The main problem is the cost of synchronisation and cacheline sharing. A
    secondary
    problem is that of CPU affinities - moving a task to another CPU nearly
    always has
    some non zero cost in terms of cache (and in case of NUMA, memory)
    efficiency.

    Our global queue scheduler was basically crap for more than 4 CPUs. We
    could give
    RT tasks a global queue with little impact to non-RT workloads (in fact,
    I think
    early iterations of the 2.6 scheduler trialed this)... but let's not
    cripple the
    RT apps that do the right thing (and need scalablility).

    Another problem is that scheduling may not be O(1) anymore, if you have
    CPU affinity
    bindings in place.

    To summaries, I believe that if per-CPU RT queues is allowed within
    POSIX, then we
    want to go with the sanest possible implementation, and force any broken
    apps to
    fix themselves.... let's not cave in now :)

    Nick

    > [
    > An historical note. USA had implemented (2) in offices, supermarkets
    > and such long before Europa. Because in Europe, customers were
    > convinced that model (2) took more time, because the queue was longer.
    > ]
    >
    >

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Charles-Henri Collin: "ip=dhcp problem..."

    Relevant Pages

    • Re: [REPORT] cfs-v4 vs sd-0.44
      ... So it would only accumulate "scheduling points" while multiuple clients are actively waiting for it, which actually sounds like exactly the right thing. ... There are subtle semantics with these kinds of things: especially if the scheduling points are only awarded when a process goes to sleep, if X is busy and continues to use the CPU, it wouldn't give any scheduling points back to clients and they really do accumulate with the server. ... The exceptions to this are the various terminal emulators (e.g. xterm, gnome-terminal, etc.) when being ...
      (Linux-Kernel)
    • Re: [ANNOUNCE][RFC] PlugSched-6.4 for 2.6.18-rc2
      ... to use the scheduling policy mechanism i.e. SCHED_FIFO, SCHED_RR, etc. ... having a different scheduler on each run queue (or sub set ... On an SMP system, you can have one CPU doing one class of scheduling (long ...
      (Linux-Kernel)
    • Re: [ANNOUNCE][RFC] PlugSched-6.4 for 2.6.18-rc2
      ... the scheduling policy mechanism i.e. SCHED_FIFO, SCHED_RR, etc. ... having a different scheduler on each run queue (or sub set of ... On an SMP system, you can have one CPU doing one class of scheduling (long ...
      (Linux-Kernel)
    • Re: [PATCH] sched: Fix adverse effects of NFS client on interactive response
      ... >>That's more or less independent of this issue as the distribution of CPU ... >>raised is largely solved by the time tasks spend on the active queue ... >>before moving to the expired queue rather than the order in which they ... > rescheduled on the active queue instead of the expired queue at the end ...
      (Linux-Kernel)
    • Re: RFC for a new Scheduling policy/class in the Linux-kernel
      ... The Group Scheduling framework is intended to permit flexible configuration of *any* scheduling semantics, RT or otherwise that is desired. ... If A is under explicit plan scheduling, and B is under CFS or CPU share, then if A would be chosen by the system if it were not blocked, then B is run as its proxy until it gets out of A's way. ... In this case no priorities exit, so Proxy Execution is not a "way of implementing Priority Inheritance". ...
      (Linux-Kernel)

    Loading