[patch] scheduler fix for 1cpu/node case

From: Erich Focht (efocht_at_hpce.nec.com)
Date: 07/28/03

  • Next message: Ville Herva: "Re: [PATCH] NMI watchdog documentation"
    To: "linux-kernel" <linux-kernel@vger.kernel.org>, LSE <lse-tech@lists.sourceforge.net>
    Date:	Mon, 28 Jul 2003 21:16:46 +0200
    
    
    

    Hi,

    after talking to several people at OLS about the current NUMA
    scheduler the conclusion was:
    (1) it sucks (for particular workloads),
    (2) on x86_64 (embarassingly simple NUMA) it's useless, goto (1).

    Fact is that the current separation of local and global balancing,
    where global balancing is done only in the timer interrupt at a fixed
    rate is way too unflexible. A CPU going idle inside a well balanced
    node will stay idle for a while even if there's a lot of work to
    do. Especially in the corner case of one CPU per node this is
    condemning that CPU to idleness for at least 5 ms. So x86_64 platforms
    (but not only those!) suffer and whish to switch off the NUMA
    scheduler while keeping NUMA memory management on.

    The attached patch is a simple solution which
    - solves the 1 CPU / node problem,
    - lets other systems behave (almost) as before,
    - opens the way to other optimisations like multi-level node
      hierarchies (by tuning the retry rate)
    - simpifies the NUMA scheduler and deletes more lines of code than it
      adds.

    The timer interrupt based global rebalancing might appear to be a
    simple and good idea but it takes the scheduler a lot of
    flexibility. In the patch the global rebalancing is done after a
    certain number of failed attempts to locally balance. The number of
    attempts is proportional to the number of CPUs in the current
    node. For only 1 CPU in the current node the scheduler doesn't even
    try to balance locally, it wouldn't make sense anyway. Of course one
    could instead set IDLE_NODE_REBALANCE_TICK = IDLE_REBALANCE_TICK, but
    this is more ugly (IMHO) and only helps when all nodes have 1 CPU /
    node.

    Please consider this for inclusion.

    Thanks,
    Erich

    
    

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/



  • Next message: Ville Herva: "Re: [PATCH] NMI watchdog documentation"

    Relevant Pages

    • Re: [patch] scheduler fix for 1cpu/node case
      ... I really feel there's no point in a NUMA scheduler for the Hammer ... The interesting thing is probably whether we want balance on exec ... Especially in the corner case of one CPU per node this is ...
      (Linux-Kernel)
    • Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3
      ... > long on one CPU before being balanced off, ... early balancing didn't seem to have problems. ... so the scheduler has to do at least an reasonable default. ... > I suspect this would still be a regression for other tests ...
      (Linux-Kernel)
    • Re: [Lse-tech] [patch] sched-domain cleanups, sched-2.6.5-rc2-mm2-A3
      ... but hopefully sched domains balancing will do ... > so the scheduler has to do at least an reasonable default. ... CPU per node means it is sensitive to node imbalances. ... >>I suspect this would still be a regression for other tests ...
      (Linux-Kernel)
    • [PATCH 0/13] Multiprocessor CPU scheduler patches
      ... I hope that you can include the following set of CPU scheduler ... It has _very_ aggressive idle CPU pulling. ... for idle balancing, revert some of the recent moves toward even ...
      (Linux-Kernel)
    • Re: Which dual opteron?
      ... >> different memory areas. ... >> (the scheduler should prefer one CPU when only one process is ready), ... >enable NUMA or not in the kernel config? ...
      (comp.os.linux.hardware)