Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning




* H. Peter Anvin <hpa@xxxxxxxxx> wrote:

Andi Kleen wrote:
I'll try to annotate the inline asms (there's not _that_ many of them),
and measure what the size impact is.

You can just use the patch I submitted and that you rejected for
most of them :)

I just ran a sample build for x86-64 with gcc 4.3.0, these all
allyesconfig builds (modulo the inlining option):

: voreg 64 ; size o.*/vmlinux
text data bss dec hex filename
59421552 24912223 15560504 99894279 5f44407 o.noopt/vmlinux
57700527 24950719 15560504 98211750 5da97a6 o.opty/vmlinux
57590217 24940519 15560504 98091240 5d8c0e8 o.andi/vmlinux

A 3% code size difference even on allyesconfig (1.8 MB!) is nothing to
sneeze at. As shown by the delta from Andi's patch, these small
assembly stubs really do need to be annotated, since gcc simply has no
way to do anything sane with them -- it just doesn't know.

I've done a finegrained size analysis today (see my other mail in this
thread), and it turns out that on gcc 4.3.x the main (and pretty much
only) inlining annotation that matters in arch/x86/include/asm/*.h is the
onliner patch attached below, annotating constant_test_bit().

That change is included in Andi's patch too AFAICS - i.e. just that single
hunk from Andi's patch would have given you 90% of the size win - an
additional 0.17% size win to the 3.00% that CONFIG_OPTIMIZE_INLINING=y
already brings.

The second patch below had some (much smaller, 0.01% ) impact too. All the
other annotations i did to hundreds of inlined asm()s had no measurable
effect on GCC 4.3.2. (i.e. gcc appears to inline single-statement asms
correctly)

[ On older GCC it might matter more, but there we can/should turn off
CONFIG_OPTIMIZE_INLINING. ]

Personally, I'd like to see __asm_inline as opposed to __always_inline
for these, though, as a documentation issue: __always_inline implies to
me that this function needs to be inlined for correctness, and this
could be highly relevant if someone, for example, recodes the routine in
C or decides to bloat it out (e.g. paravirt_ops).

Yeah. I've implemented __asm_inline today. It indeed documents the reason
for the annotation in a cleaner way than slapping __always_inline around
and diluting the quality of __always_inline annotations.

It's not a perfect solution even then, because gcc may choose to not
inline a higher level of inline functions for the same bogus reason.
There isn't much we can do about that, though, unless gcc either
integrates the assembler, or gives us some way of injecting the actual
weight of the asm statement...

Yeah.

Ingo

---
arch/x86/include/asm/bitops.h | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)

Index: linux/arch/x86/include/asm/bitops.h
===================================================================
--- linux.orig/arch/x86/include/asm/bitops.h
+++ linux/arch/x86/include/asm/bitops.h
@@ -300,7 +300,8 @@ static inline int test_and_change_bit(in
return oldbit;
}

-static inline int constant_test_bit(int nr, const volatile unsigned long *addr)
+static __asm_inline int
+constant_test_bit(int nr, const volatile unsigned long *addr)
{
return ((1UL << (nr % BITS_PER_LONG)) &
(((unsigned long *)addr)[nr / BITS_PER_LONG])) != 0;



arch/x86/include/asm/bitops.h | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)

Index: linux/arch/x86/include/asm/bitops.h
===================================================================
--- linux.orig/arch/x86/include/asm/bitops.h
+++ linux/arch/x86/include/asm/bitops.h
@@ -53,7 +53,7 @@
* Note that @nr may be almost arbitrarily large; this function is not
* restricted to acting on a single-word quantity.
*/
-static inline void set_bit(unsigned int nr, volatile unsigned long *addr)
+static __asm_inline void set_bit(unsigned int nr, volatile unsigned long *addr)
{
if (IS_IMMEDIATE(nr)) {
asm volatile(LOCK_PREFIX "orb %1,%0"
@@ -75,7 +75,7 @@ static inline void set_bit(unsigned int
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __set_bit(int nr, volatile unsigned long *addr)
+static __asm_inline void __set_bit(int nr, volatile unsigned long *addr)
{
asm volatile("bts %1,%0" : ADDR : "Ir" (nr) : "memory");
}
@@ -90,7 +90,7 @@ static inline void __set_bit(int nr, vol
* you should call smp_mb__before_clear_bit() and/or smp_mb__after_clear_bit()
* in order to ensure changes are visible on other processors.
*/
-static inline void clear_bit(int nr, volatile unsigned long *addr)
+static __asm_inline void clear_bit(int nr, volatile unsigned long *addr)
{
if (IS_IMMEDIATE(nr)) {
asm volatile(LOCK_PREFIX "andb %1,%0"
@@ -117,7 +117,7 @@ static inline void clear_bit_unlock(unsi
clear_bit(nr, addr);
}

-static inline void __clear_bit(int nr, volatile unsigned long *addr)
+static __asm_inline void __clear_bit(int nr, volatile unsigned long *addr)
{
asm volatile("btr %1,%0" : ADDR : "Ir" (nr));
}
@@ -152,7 +152,7 @@ static inline void __clear_bit_unlock(un
* If it's called on the same region of memory simultaneously, the effect
* may be that only one operation succeeds.
*/
-static inline void __change_bit(int nr, volatile unsigned long *addr)
+static __asm_inline void __change_bit(int nr, volatile unsigned long *addr)
{
asm volatile("btc %1,%0" : ADDR : "Ir" (nr));
}
@@ -166,7 +166,7 @@ static inline void __change_bit(int nr,
* Note that @nr may be almost arbitrarily large; this function is not
* restricted to acting on a single-word quantity.
*/
-static inline void change_bit(int nr, volatile unsigned long *addr)
+static __asm_inline void change_bit(int nr, volatile unsigned long *addr)
{
if (IS_IMMEDIATE(nr)) {
asm volatile(LOCK_PREFIX "xorb %1,%0"
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [patch] measurements, numbers about CONFIG_OPTIMIZE_INLINING=y impact
    ... gcc _does_ have a perfectly fine notion of how heavy-weight ... an "asm" statement is: just count it as a single instruction (and count ... I doubt that it's the inline asm that was the biggest reason ... size increase/decrease of the core kernel, ...
    (Linux-Kernel)
  • Re: Asm programming capability?
    ... use externally assembled asm on the rare occasions I need it. ... vaguely remember reading somewhere that gcc is doing an increasingly ... good job of accepting Intel syntax - I don't know if it's reached the ... probably use inline ASM... ...
    (comp.lang.c)
  • Re: [Bug #13185] New x86 warning
    ... This patch regresses on older GCC versions ... the 64bit kernel runs slower than the 32bit kernel because ... Also the general trend in the kernel is to inline less ...
    (Linux-Kernel)
  • Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning
    ... I just ran a sample build for x86-64 with gcc 4.3.0, ... As shown by the delta from Andi's patch, ... inline a higher level of inline functions for the same bogus reason. ... Intel Open Source Technology Center ...
    (Linux-Kernel)
  • Re: [PATCH] Fix compilation on gcc 3.4
    ... > inline in header files, but there is no body available. ... I made testing with equivalent of Andi's patch and with my last fixes to ... GCC code size estimate, there are 50 calls in the whole kernel where GCC ... concludes that inlining is not good idea. ...
    (Linux-Kernel)