Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- From: Max Krasnyansky <maxk@xxxxxxxxxxxx>
- Date: Mon, 04 Aug 2008 15:11:13 -0700
Paul Jackson wrote:
So far as I can tell, the logic and design is ok.Actually we do need it. See below.
I found some of the comments, function names and code factoring
to be confusing or incomplete. And I suspect that the rebuilding
of sched domains in the case that sched_power_savings_store()
is called in kernel/sched.c, on systems using cpusets, is not
needed or desirable (I could easily be wrong here - beware!).
I'm attaching a patch that has the changes that I would likeCheck.
to suggest for your consideration. I have only recompiled this
patch, for one configuration - x86_64 with CPUSETS. I am hoping,
Max, that you can complete the testing.
The patch below applies to 2.6.27-rc1, plus the first patch,
"origin.patch" in Andrew's 2.6.27-rc1-mm1 quilt patch stack,
plus your (Max's) latest:
[PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
Here's a description of most of what I noticed:
1) Unrelated spacing tweak, changing:
LIST_HEAD(q); /* queue of cpusets to be scanned*/
to:
LIST_HEAD(q); /* queue of cpusets to be scanned */
2) The phrasing:Check.
Must be called with cgroup_lock held.
seems imprecise to me; "cgroup_lock" is the name of a wrapper
function, not of a lock. The underlying lock is cgroup_mutex,
which is still mentioned by name in various kernel/cpuset.c
comments, even though cgroup_mutex is static in kernel/cgroup.c
So I fiddled with the wording of this phrasing.
3) You removed the paragraph:I think it it's covered by the top level comment that starts with
* ... May take callback_mutex during
* call due to the kfifo_alloc() and kmalloc() calls. May nest
* a call to the get_online_cpus()/put_online_cpus() pair.
* Must not be called holding callback_mutex, because we must not
* call get_online_cpus() while holding callback_mutex. Elsewhere
* the kernel nests callback_mutex inside get_online_cpus() calls.
* So the reverse nesting would risk an ABBA deadlock.
But you didn't replace it with an updated locking description.
I expanded and tweaked various locking comments.
/*
* There are two global mutexes guarding cpuset structures. The first
* is the main control groups cgroup_mutex, accessed via
* cgroup_lock()/cgroup_unlock().
....
But I'm ok with your suggested version.
4) The alignment of the right side of consecutive assignment statements,Check.
as in:
ndoms = 0;
doms = NULL;
dattr = NULL;
csa = NULL;
or
*domains = doms;
*attributes = dattr;
is not usually done in kernel code. I suggest obeying convention,
and not aligning these here either.
5) The rebuilding of sched domains in the sched_power_savings_store()Actually it is appropriate, and there is one more user of the
routine in kernel/sched.c struck me as inappropriate for systems
that were managing sched domains using cpusets. So I made that
sched.c rebuild only apply if CONFIG_CPUSETS was not configured,
which in turn enabled me to remove rebuild_sched_domains() from
being externally visible outside kernel/cpuset.c
I don't know if this change is correct, however.
arch_reinit_sched_domains() which is S390 topology updates.
Those things (mc_power and topology updates) have to update domain flags based
on the mc/smt power and current topology settings.
This is done in the
__rebuild_sched_domains()
...
SD_INIT(sd, ALLNODES);
...
SD_INIT(sd, MC);
...
SD_INIT(sd,X) uses one of SD initializers defined in the include/linux/topology.h
For example SD_CPU_INIT() includes BALANCE_FOR_PKG_POWER which expands to
#define BALANCE_FOR_PKG_POWER \
((sched_mc_power_savings || sched_smt_power_savings) ? \
SD_POWERSAVINGS_BALANCE : 0)
Yes it's kind convoluted :). Anyway, the point is that we need to rebuild the
domains when those settings change. We could probably write a simpler version
that just iterates existing domains and updates the flags. Maybe some other dat :)
6) The renaming of rebuild_sched_domains() to generate_sched_domains()Actually it was not much of a rename. The only rename was
was only partial, and along with the added similarly named routines
seemed confusing to me. Also, that rename of rebuild_sched_domains()
was only partially accomplished, not being done in some comments
and in one printk kernel warning.
I reverted that rename.
rebuild_sched_domains() -> async_rebuild_sched_domains() and yes looks like I
missed one or two comment.
The other stuff was basically factoring out the function that generates the
domain masks/attrs from the function that actually tells the scheduler to
rebuild them.
I'd be ok with some other name for generate_sched_domains() if you think it's
confusing.
btw With your current proposal we have
rebuild_sched_domains() - just generates domain masks/attrs
async_rebuild_sched_domains() - generates domain masks/attrs and actually
rebuilds the domains
That I think is more confusing. So I guess my preference would be to have the
generation part separate. And like I explained above we do need to be able to
call rebuild_sched_domains() from the scheduler code (at least at this point).
I also reduced by one the number of functions needed to handleYes if we did not have to call rebuild_sched_domains() from
the asynchronous invocation of this rebuild, essentially collapsing
your routines rebuild_sched_domains() and rebuild_domains_callback()
into a single routine, named rebuild_sched_domains_thread().
Thanks to the above change (5), the naming and structure of these
routines was no longer visible outside kernel/cpuset.c, making
this collapsing of two functions into one easier.
arch_reinit_sched_domains() I would've done something similar.
Sounds like you're ok with the general approach and as I mentioned we do need
externally usable rebuild_sched_domains(). So how about this. I'll update
style/comments based on your suggestions but keep the generate_sched_domains()
and partition_sched_domains() split. Or if you have a better name for
generate_sched_domains() we'll use that instead.
Max
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- Follow-Ups:
- References:
- [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- From: Max Krasnyansky
- Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- From: Paul Jackson
- Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- From: Max Krasnyansky
- Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- From: Paul Jackson
- Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- From: Max Krasnyansky
- Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- From: Paul Jackson
- [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- Prev by Date: RE: [BISECTION RESULT] sched: revert cpu_clock to pre-27ec4407790d075c325e1f4da0a19c56953cce23 state
- Next by Date: Re: [PATCH 2/6] Container Freezer: Make refrigerator always available
- Previous by thread: Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- Next by thread: Re: [PATCH] cpuset: Rework sched domains and CPU hotplug handling (2.6.27-rc1)
- Index(es):
Relevant Pages
|