Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- From: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx>
- Date: Thu, 14 Aug 2008 09:10:36 -0700 (PDT)
On Thu, 14 Aug 2008, Mathieu Desnoyers wrote:
I can't argue about the benefit of using VM CPU pinning to manage
resources because I don't use it myself, but I ran some tests out of
curiosity to find if uncontended locks were that cheap, and it turns out
they aren't.
Absolutely.
Locked ops show up not just in microbenchmarks looping over the
instruction, they show up in "real" benchmarks too. We added a single
locked instruction (maybe it was two) to the page fault handling code some
time ago, and the reason I noticed it was that it actually made the page
fault cost visibly more expensive in lmbench. That was a _single_
instruction in the hot path (or maybe two).
And the page fault path is some of the most timing critical in the whole
kernel - if you have everything cached, the cost of doing the page faults
to populate new processes for some fork/exec-heavy workload (and compiling
the kernel is just one of those - any traditional unix behaviour will show
this) is critical.
This is one of the things AMD does a _lot_ better than Intel. Intel tends
to have a 30-50 cycle cost (with later P4s being *much* worse), while AMD
tends to have a cost of around 10-15 cycles.
It's one of the things Intel promises to have improved in the next-gen
uarch (Nehalem), an while I am not supposed to give out any benchmarks, I
can confirm that Intel is getting much better at it. But it's going to be
visible still, and it's really a _big_ issue on P4.
(Of course, on P4, the page fault exception cost itself is so high that
the cost of atomics may be _relatively_ less noticeable in that particular
path)
Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
- References:
- Re: Efficient x86 and x86_64 NOP microbenchmarks
- From: Linus Torvalds
- Re: Efficient x86 and x86_64 NOP microbenchmarks
- From: Andi Kleen
- Re: Efficient x86 and x86_64 NOP microbenchmarks
- From: Mathieu Desnoyers
- Re: Efficient x86 and x86_64 NOP microbenchmarks
- From: Andi Kleen
- Re: Efficient x86 and x86_64 NOP microbenchmarks
- From: Mathieu Desnoyers
- [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- From: Mathieu Desnoyers
- Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- From: H. Peter Anvin
- Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- From: Jeremy Fitzhardinge
- Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- From: Mathieu Desnoyers
- Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- From: Jeremy Fitzhardinge
- Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- From: Mathieu Desnoyers
- Re: Efficient x86 and x86_64 NOP microbenchmarks
- Prev by Date: Re: POSIX_FADV_DONTNEED
- Next by Date: [PATCH] Re: tty: ctrl-c not always echoed, especially under load
- Previous by thread: Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- Next by thread: Re: [RFC PATCH] x86 alternatives : fix LOCK_PREFIX race with preemptible kernel and CPU hotplug
- Index(es):
Relevant Pages
|