Re: -Os versus -O2



then do we need a new option 'optimize for best overall performance' that goes for size (and the corresponding wins there) most of the time, but is ignored where it makes a huge difference?

That's -Os mostly. Some awful CPUs really need higher
loop/label/function alignment though to get any
performance; you could add -falign-xxx options for those.

in reality this was a flaw in gcc that on modern CPU's with the larger difference between CPU speed and memory speed it still preferred to unroll loops (eating more memory and blowing out the cpu cache) when it shouldn't have.

You told it to unroll loops, so it did. No flaw. If you
feel the optimisations enabled by -O2 should depend on the
CPU tuning selected, please file a PR.

Also note that whether or not it is profitable to unroll
a particular loop depends largely on how "hot" that loop
is, and GCC doesn't know much about that if you don't feed
it profiling information (it can guess a bit, sure, but it
can guess wrong too).


Segher

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Cost of calling a standard library function
    ... It accesses/reads memory using esi 4 ... > safly move it within the cache, without having to go via ebx. ... try it the same thing on a different earlier CPU, ... should check it out...for "tight inner loop" stuff, ...
    (alt.lang.asm)
  • Re: Polling, Interrupts, DMA, Synchronous, Asynchronous I/O Definitions
    ... the terminology is less useful than it might be. ... though a "message loop" could arguably be claimed to be ... considering in a particular iteration but what is true for _ALL_ ... watching the "polling" version eating up every single CPU cycle ...
    (alt.lang.asm)
  • Re: getting a threads state and CPU utilization
    ... of CPU time currently being used)? ... How could you tell an infinite loop from a polling loop ... responding to continuous messages from many client apps, ... though the customer *claims* that the client app is disconnecting (I suspect ...
    (microsoft.public.win32.programmer.kernel)
  • Re: Detect the loop for batch job
    ... We are looking for a tool to detect the loop for batch application, ... step is to impose and enforce standards requiring reasonable CPU TIME ...
    (bit.listserv.ibm-main)
  • Re: sched_yield() problems...
    ... register in the CPU core. ... This is a 64 bit register that count the clock ... loop, I get reaction time of less than 10 us. ... In fact, I don't know why your gettimeofday don't work, so don't blame ...
    (alt.os.linux.redhat)