Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)



On Sat, Feb 14, 2009 at 12:46:02AM -0500, Mike Frysinger wrote:
On Sat, Feb 14, 2009 at 00:20, Paul E. McKenney wrote:
On Sat, Feb 14, 2009 at 12:07:46AM -0500, Mike Frysinger wrote:
On Fri, Feb 13, 2009 at 14:36, Paul E. McKenney wrote:
On Fri, Feb 13, 2009 at 01:54:11PM -0500, Mathieu Desnoyers wrote:
* Linus Torvalds (torvalds@xxxxxxxxxxxxxxxxxxxx) wrote:
Btw, for user space, if you want to do this all right for something like
BF. I think the only _correct_ thing to do (in the sense that the end
result will actually be debuggable) is to essentially give full SMP
coherency in user space.

It's doable, but rather complicated, and I'm not 100% sure it really ends
up making sense. The way to do it is to just simply say:

- never map the same page writably on two different cores, and always
flush the cache (on the receiving side) when you switch a page from one
core to another.

Now, the kernel can't really do that reasonably, but user space possibly could.

Now, I realize that blackfin doesn't actually even have a MMU or a TLB, so
by "mapping the same page" in that case we end up really meaning "having a
shared mapping or thread". I think that _should_ be doable. The most
trivial approach might be to simply limit all processes with shared
mappings or CLONE_VM to core 0, and letting core 1 run everything else
(but you could do it differently: mapping something with MAP_SHARED would
force you to core 0, but threads would just force the thread group to
stay on _one_ core, rather than necessarily a fixed one).

Yeah, because of the lack of real memory protection, the kernel can't
_know_ that processes don't behave badly and access things that they
didn't explicitly map, but I'm hoping that that is rare.

And yes, if you really want to use threads as a way to do something
across cores, you'd be screwed - the kenrel would only schedule the
threads on one CPU. But considering the undefined nature of threading on
such a cpu, wouldn't that still be preferable? Wouldn't it be nice to have
the knowledge that user space _looks_ cache-coherent by virtue of the
kernel just limiting cores appropriately?

And then user space would simply not need to worry as much. Code written
for another architecture will "just work" on BF SMP too. With the normal
uclinux limitations, of course.

I don't know enough about BF to tell for sure, but the other way around
I see that would still permit running threads with shared memory space
on different CPUs is to call a cache flush each time a userspace lock is
taken/released (at the synchronization points where the "magic
test-and-set instruction" is used) _from_ userspace.

If some more elaborate userspace MT code uses something else than those
basic locks provided by core libraries to synchronize data exchange,
then it would be on its own and have to ensure cache flushing itself.

How about just doing a sched_setaffinity() in the BF case? Sounds
like an easy way to implement Linus's suggestion of restricting the
multithreaded processes to a single core. I have a hard time losing
sleep over the lack of parallelism in the case where the SMP support is
at best rudimentary...

the quick way is to tell people to run their program through `taskset`
(which is what we're doing now).

Not sure what environment Mathieu is looking to run his program from,
but he would need to run it on multiple architectures.

right, that is exactly the kind of thing we strive to avoid on our
(the Blackfin) side of things

the next step up (or down depending on how you look at it) would be to
hook the clone function to do this automatically. i havent gotten
around to testing this yet which is why there isnt anything in there
yet though.

asmlinkage int bfin_clone(struct pt_regs....
unsigned long clone_flags;
unsigned long newsp;

+#ifdef CONFIG_SMP
+ if (current->rt.nr_cpus_allowed == NR_CPUS) {
+ current->cpus_allowed = cpumask_of_cpu(smp_processor_id());
+ current->rt.nr_cpus_allowed = 1;
+ }
+#endif
+
/* syscall2 puts clone_flags in r0 and usp in r1 */
clone_flags = regs->r0;
newsp = regs->r1;

Wouldn't you also have to make sched_setaffinity() cut back to only one
CPU if more are specified?

mmm, yes and no. if we wanted to keep the transparency thing going,
then adding a check to the affinity functions to make sure threaded
apps dont span cpus would be needed. but i would think we'd want to
have it return an error rather (EINVAL prob) than attempting to make
any automatic selections. the only real blocker here would be
figuring out how to detect the application in question is threaded
with 100% accuracy. the Blackfin port does not yet have TLS support
which means we're using linuxthreads rather than NPTL ...

hooking clone gives us the biggest bang for the buck: majority of
stuff today are threaded applications that dont look at affinity.

If Blackfin handles hotplug CPU, that may
need attention as well, since tasks affinitied to the CPU being removed
can end up with their affinity set to all CPUs. And there are probably
other issues.

no, we dont support hotplugging of CPUs. there is no hardware support
for it, so i think the only thing you'd gain is perhaps power savings
? not sure it would even work in our case though as the hardware does
not support restarting or shutdown of one core ... they both have to
restart/shutdown. putting one core into a constant idle loop would
save power, but that can already be accomplished by reducing the apps
that go onto a specific core.

OK, that removes that issue, at least aside from any people who will
take the software approach to CPU hotplug (leaving the unplugged CPU
spinning with irqs disabled or some such).

Other potential issues include unrelated processes that share memory via
shmget() or mmap() -- presumably groups of such processes would need to
be bound to a single CPU?

Thanx, Paul
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)
    ... result will actually be debuggable) is to essentially give full SMP ... core to another. ... Now, the kernel can't really do that reasonably, but user space possibly could. ... threads on one CPU. ...
    (Linux-Kernel)
  • Re: Advice please to choose cpu & chipset
    ... acceleration which offloads some of the work from the CPU. ... they are reasonably power efficient compared to ... with an OEM system swapping to a lower RPM fan for noise ... run, a dual core system is highly preferred, quad core more ...
    (alt.comp.hardware.pc-homebuilt)
  • Re: [ltt-dev] [RFC git tree] Userspace RCU (urcu) for Linux (repost)
    ... core to another. ... Now, the kernel can't really do that reasonably, but user space possibly could. ... threads on one CPU. ... sleep over the lack of parallelism in the case where the SMP support is ...
    (Linux-Kernel)
  • Re: what are the most popular building and packaging tools for python ??
    ... I don't think it's a stretch to imagine a CPU core with a "secure kitchen" ... Using this kind of system, a customer would give you his CPU's public key and serial number, ... >net is available and must still offer full functionality no matter what. ...
    (comp.lang.python)
  • Re: Replacing Computer
    ... XP Pro two. ... XP Home can use a quad-core cpu just fine. ... option with a Core 2 processor claiming that they only work with XP ... support is identical. ...
    (microsoft.public.windowsxp.general)

Loading