delay_tsc(): inefficient delay loop (2.6.16-mm1)



Hello all,

I discovered that the delay loop used there (which simply found a new place
in 2.6.16-mm1; it existed much longer) is inefficient,
since it does an unnecessary subtraction *in the main loop* which also hits
asm code:

00000024 <delay_tsc>:
24: 53 push %ebx
25: 89 c3 mov %eax,%ebx
27: 0f 31 rdtsc
29: 89 c1 mov %eax,%ecx
2b: f3 90 pause
2d: 0f 31 rdtsc
2f: 29 c8 sub %ecx,%eax
31: 39 d8 cmp %ebx,%eax
33: 72 f6 jb 2b <delay_tsc+0x7>
35: 5b pop %ebx
36: c3 ret

With such a patch:

--- linux-2.6.16-mm1/arch/i386/lib/delay.c.orig 2006-03-23 12:52:45.000000000 +0100
+++ linux-2.6.16-mm1/arch/i386/lib/delay.c 2006-03-24 12:53:01.000000000 +0100
@@ -44,10 +44,12 @@
unsigned long bclock, now;

rdtscl(bclock);
+ /* offset with bclock to have very simple comparison below */
+ loops += bclock;
do {
rep_nop();
rdtscl(now);
- } while ((now-bclock) < loops);
+ } while (now < loops);
}

/*

the result is:

00000024 <delay_tsc>:
24: 89 c1 mov %eax,%ecx
26: 0f 31 rdtsc
28: 01 c1 add %eax,%ecx
2a: f3 90 pause
2c: 0f 31 rdtsc
2e: 39 c8 cmp %ecx,%eax
30: 72 f8 jb 2a <delay_tsc+0x6>
32: c3 ret

Improvement: no unnecessary stuff after having hit the timer target value
(read: faster), better power saving, 4 bytes less opcodes.

The patch above could be considered weird since it fiddles with "loops"
directly. A possibly cleaner way would be to introduce a new variable
end = bclock + loops.

Comments? Anything that I'm missing here as to why it hasn't been done that
way before?

If this holds water then I'll submit a final patch soon.

Thanks!

Andreas Mohr
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • [x86_64 MCE] [RFC] mce.c race condition (or: when evil hacks are the only options)
    ... The race requires a large number of machine checks to be occurring in order ... In the normal case, the rest would get cleaned up by the subsequent loop, ... fact is waiting for all CPUs to be done, which could take up to a tick -- or ... I've come up with a patch that does this, ...
    (Linux-Kernel)
  • Re: [PATCH 1/1] (v3) SYSVIPC - Fix the ipc structures initialization
    ... a simple testcase is concurrently running an infinite loop on ... So idr_get_newis inserting a pointer into the ... This patch moves the spin_lock_initbefore the call to ipc_addid. ... return err; ...
    (Linux-Kernel)
  • Re: [PATCH 1/1] (v3) SYSVIPC - Fix the ipc structures initialization
    ... a simple testcase is concurrently running an infinite loop on ... So idr_get_newis inserting a pointer into the ... This patch moves the spin_lock_initbefore the call to ipc_addid. ... return err; ...
    (Linux-Kernel)
  • Re: Problem with inotify
    ... > Thanks for writing that patch, ... > inotify-test before unmounting results in a clean unmount. ... My analysis was that there is an infinite loop and this is what ... > loop when unmounting with inotify watches active. ...
    (Linux-Kernel)
  • Re: [PATCH 00/17] per device dirty throttling -v7
    ... Patches are against 2.6.22-rc4-mm2 ... writes in a setup involving a loop dev ... ext3 over loop over ext3 ... Without the patch, in all the cases I've seen deadlocks or long ...
    (Linux-Kernel)