[patch] scheduler fix for 1cpu/node case
From: Erich Focht (efocht_at_hpce.nec.com)
Date: 07/28/03
- Previous message: Pete Zaitcev: "pmd_large vs. pmd_huge"
- Next in thread: Martin J. Bligh: "Re: [patch] scheduler fix for 1cpu/node case"
- Reply: Martin J. Bligh: "Re: [patch] scheduler fix for 1cpu/node case"
- Reply: Martin J. Bligh: "Re: [patch] scheduler fix for 1cpu/node case"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
To: "linux-kernel" <linux-kernel@vger.kernel.org>, LSE <lse-tech@lists.sourceforge.net> Date: Mon, 28 Jul 2003 21:16:46 +0200
Hi,
after talking to several people at OLS about the current NUMA
scheduler the conclusion was:
(1) it sucks (for particular workloads),
(2) on x86_64 (embarassingly simple NUMA) it's useless, goto (1).
Fact is that the current separation of local and global balancing,
where global balancing is done only in the timer interrupt at a fixed
rate is way too unflexible. A CPU going idle inside a well balanced
node will stay idle for a while even if there's a lot of work to
do. Especially in the corner case of one CPU per node this is
condemning that CPU to idleness for at least 5 ms. So x86_64 platforms
(but not only those!) suffer and whish to switch off the NUMA
scheduler while keeping NUMA memory management on.
The attached patch is a simple solution which
- solves the 1 CPU / node problem,
- lets other systems behave (almost) as before,
- opens the way to other optimisations like multi-level node
hierarchies (by tuning the retry rate)
- simpifies the NUMA scheduler and deletes more lines of code than it
adds.
The timer interrupt based global rebalancing might appear to be a
simple and good idea but it takes the scheduler a lot of
flexibility. In the patch the global rebalancing is done after a
certain number of failed attempts to locally balance. The number of
attempts is proportional to the number of CPUs in the current
node. For only 1 CPU in the current node the scheduler doesn't even
try to balance locally, it wouldn't make sense anyway. Of course one
could instead set IDLE_NODE_REBALANCE_TICK = IDLE_REBALANCE_TICK, but
this is more ugly (IMHO) and only helps when all nodes have 1 CPU /
node.
Please consider this for inclusion.
Thanks,
Erich
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- text/x-diff attachment: 1cpufix-lb-2.6.0t1.patch
- Previous message: Pete Zaitcev: "pmd_large vs. pmd_huge"
- Next in thread: Martin J. Bligh: "Re: [patch] scheduler fix for 1cpu/node case"
- Reply: Martin J. Bligh: "Re: [patch] scheduler fix for 1cpu/node case"
- Reply: Martin J. Bligh: "Re: [patch] scheduler fix for 1cpu/node case"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|