Re: [PATCH] Remove some divide instructions

From: Zachary Amsden (zach_at_vmware.com)
Date: 10/28/04

  • Next message: Hua Zhong: "That's it - RE: My thoughts on the "new development model""
    Date:	Wed, 27 Oct 2004 15:14:57 -0700
    To: Linus Torvalds <torvalds@osdl.org>, linux-kernel@vger.kernel.org
    
    
    

    Apologies in advance for e-mail difficulties today - I can't reply
    directly to your message, so threading may be off.

    >I'd suggest _only_ changing the i386 version.
    >
    >For example, your asm-generic changes looks senseless, since they only
    >make the macros more complex, without actually changing anything. And the
    >other architectures may want to do other things, since right now at least
    >some of them use things like fixed hardware registers etc which is not
    >necessarily appropriate for the non-asm case...
    >
    >That way you'd also only modify the architecture that you can actually
    >verify..
    >
    > Linus
    >

    Backed out everything but i386 and generic. For the generic version, I
    compiled and tested this code outside of the kernel. Actually, I found
    that at least for my tool chain, the generic version

    +# define do_div(n,base) ({ \
    + uint32_t __rem; \
    + if (__builtin_constant_p(base) && !((base) & ((base)-1))) { \
    + __rem = ((uint64_t)(n)) % (base); \
    + (n) = ((uint64_t)(n)) / (base); \
    + } else { \
    + uint32_t __base = (base); \
    + __rem = ((uint64_t)(n)) % __base; \
    + (n) = ((uint64_t)(n)) / __base; \
    + } \
    + __rem;

    Does indeed generate different code for the constant case - without it,
    due to the assignment to __base, the shift / mask optimization does not
    take place. Apparently the constant attribute and associated
    optimizations do not propagate through the assignment. Other gccs may
    behave differently. I also tried making __base const, to no avail. If
    one were willing to ignore the potential macro side effects of the
    references to (base), the code could be the same, but I'm not the best
    judge of whether that is a good thing to do.

    Zach
    zach@vmware.com

    
    

    diff -ru linux-2.6.10-rc1/include/asm-generic/div64.h linux-2.6.10-rc1-nsz/include/asm-generic/div64.h
    --- linux-2.6.10-rc1/include/asm-generic/div64.h 2004-10-27 10:41:40.000000000 -0700
    +++ linux-2.6.10-rc1-nsz/include/asm-generic/div64.h 2004-10-27 10:24:06.000000000 -0700
    @@ -22,12 +22,17 @@
     
     #if BITS_PER_LONG == 64
     
    -# define do_div(n,base) ({ \
    - uint32_t __base = (base); \
    - uint32_t __rem; \
    - __rem = ((uint64_t)(n)) % __base; \
    - (n) = ((uint64_t)(n)) / __base; \
    - __rem; \
    +# define do_div(n,base) ({ \
    + uint32_t __rem; \
    + if (__builtin_constant_p(base) && !((base) & ((base)-1))) { \
    + __rem = ((uint64_t)(n)) % (base); \
    + (n) = ((uint64_t)(n)) / (base); \
    + } else { \
    + uint32_t __base = (base); \
    + __rem = ((uint64_t)(n)) % __base; \
    + (n) = ((uint64_t)(n)) / __base; \
    + } \
    + __rem; \
      })
     
     #elif BITS_PER_LONG == 32
    @@ -37,16 +42,21 @@
     /* The unnecessary pointer compare is there
      * to check for type safety (n must be 64bit)
      */
    -# define do_div(n,base) ({ \
    - uint32_t __base = (base); \
    - uint32_t __rem; \
    - (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
    - if (likely(((n) >> 32) == 0)) { \
    - __rem = (uint32_t)(n) % __base; \
    - (n) = (uint32_t)(n) / __base; \
    - } else \
    - __rem = __div64_32(&(n), __base); \
    - __rem; \
    +# define do_div(n,base) ({ \
    + uint32_t __rem; \
    + (void)(((typeof((n)) *)0) == ((uint64_t *)0)); \
    + if (__builtin_constant_p(base) && !((base) & ((base)-1))) { \
    + __rem = ((uint64_t)(n)) % (base); \
    + (n) = ((uint64_t)(n)) / (base); \
    + } else { \
    + uint32_t __base = (base); \
    + if (likely(((n) >> 32) == 0)) { \
    + __rem = (uint32_t)(n) % __base; \
    + (n) = (uint32_t)(n) / __base; \
    + } else \
    + __rem = __div64_32(&(n), __base); \
    + } \
    + __rem; \
      })
     
     #else /* BITS_PER_LONG == ?? */
    diff -ru linux-2.6.10-rc1/include/asm-i386/div64.h linux-2.6.10-rc1-nsz/include/asm-i386/div64.h
    --- linux-2.6.10-rc1/include/asm-i386/div64.h 2004-10-27 10:41:40.000000000 -0700
    +++ linux-2.6.10-rc1-nsz/include/asm-i386/div64.h 2004-10-27 10:15:04.000000000 -0700
    @@ -14,16 +14,23 @@
      * convention" on x86.
      */
     #define do_div(n,base) ({ \
    - unsigned long __upper, __low, __high, __mod, __base; \
    - __base = (base); \
    - asm("":"=a" (__low), "=d" (__high):"A" (n)); \
    - __upper = __high; \
    - if (__high) { \
    - __upper = __high % (__base); \
    - __high = __high / (__base); \
    + unsigned long __mod; \
    + if (__builtin_constant_p(base) && !((base) & ((base)-1))) { \
    + __mod = ((uint64_t)(n)) % base; \
    + (n) = ((uint64_t)(n)) / base; \
    + } else { \
    + unsigned long __upper, __low, __high, __base; \
    + __base = (base); \
    + asm("":"=a" (__low), "=d" (__high):"A" (n)); \
    + __upper = __high; \
    + if (__high) { \
    + __upper = __high % (__base); \
    + __high = __high / (__base); \
    + } \
    + asm("divl %2": "=a" (__low), "=d" (__mod) : \
    + "rm" (__base), "0" (__low), "1" (__upper)); \
    + asm("":"=A" (n):"a" (__low),"d" (__high)); \
             } \
    - asm("divl %2":"=a" (__low), "=d" (__mod):"rm" (__base), "0" (__low), "1" (__upper)); \
    - asm("":"=A" (n):"a" (__low),"d" (__high)); \
             __mod; \
     })
     

    -
    To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
    the body of a message to majordomo@vger.kernel.org
    More majordomo info at http://vger.kernel.org/majordomo-info.html
    Please read the FAQ at http://www.tux.org/lkml/


  • Next message: Hua Zhong: "That's it - RE: My thoughts on the "new development model""

    Relevant Pages

    • Re: [bugfix] try_to_unmap_cluster() passes out-of-bounds pte to pte_unmap()
      ... It may be more of an issue with architectures that actually *do* do ... I don't see anyone getting motivated enough to rewrite the ... send the line "unsubscribe linux-kernel" in ... More majordomo info at http://vger.kernel.org/majordomo-info.html ...
      (Linux-Kernel)
    • Re: [patch] Real-Time Preemption, -VP-2.6.9-rc4-mm1-U2
      ... mm/shmem.c:1362: warning: assignment makes pointer from integer without a cast ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [Discuss][i386] Platform SMIs and their interferance with tsc based delay calibration
      ... We increase the delay calibration time and also identify any ... From a maintainability POV it's not good that x86 is no longer using the ... architectures must implement arch_calibrate_delay, then provide stubs for ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: [2.6 patch] kill include/linux/platform.h
      ... present on all architectures or whether it's an architecture-specific ... required by cleaning up warnings with the -Wmissing-prototypes compiler ... There had been need of rain for many days. ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)
    • Re: optimize __gp location
      ... > Unfortunately marking jiffies and similar small but high usage ... > It might be worth doing, but the change would have to be structured so ... I think there are other architectures which would prefer ... send the line "unsubscribe linux-kernel" in ...
      (Linux-Kernel)