Re: gcc inlining heuristics was Re: [PATCH -v7][RFC]: mutex: implement adaptive spinning



On Wed, Jan 21, 2009 at 10:20:49AM +0100, Andi Kleen wrote:
On Wed, Jan 21, 2009 at 09:52:08AM +0100, Nick Piggin wrote:
On Wed, Jan 21, 2009 at 09:54:02AM +0100, Andi Kleen wrote:
GCC 4.3.2. Maybe i missed something obvious?

The typical use case of restrict is to tell it that multiple given
arrays are independent and then give the loop optimizer
more freedom to handle expressions in the loop that
accesses these arrays.

Since there are no loops in the list functions nothing changed.

Ok presumably there are some other optimizations which
rely on that alias information too, but again the list_*
stuff is probably too simple to trigger any of them.

Any function that does several interleaved loads and stores
through different pointers could have much more freedom to
move loads early and stores late.

For once that would require more live registers. It's not
a clear and obvious win. Especially not if you have
only very little registers, like on 32bit x86.

Then it would typically increase code size.

The point is that the compiler is then free to do it. If things
slow down after the compiler gets *more* information, then that
is a problem with the compiler heuristics rather than the
information we give it.


Then x86s tend to have very very fast L1 caches and
if something is not in L1 on reads then the cost of fetching
something for a read dwarfs the few cycles you can typically
get out of this.

Well most architectures have L1 caches of several cycles. And
L2 miss typically means going to L2 which in some cases the
compiler is expected to attempt to cover as much as possible
(eg in-order architectures).

If the caches are missed completely, then especially with an
in-order architecture, you want to issue as many parallel loads
as possible during the stall. If the compiler can't resolve
aliases, then it simply won't be able to bring some of those
loads forward.


And lastly even on a in order system stores can
be typically queued without stalling, so it doesn't
hurt to do them early.

Store queues are, what? On the order of tens of entries for
big power hungry x86? I'd guess much smaller for low power
in-order x86 and ARM etc. These can definitely fill up and
stall, so you still want to get loads out early if possible.

Even a lot of OOOE CPUs I think won't have the best alias
anaysis, so all else being equal, it wouldn't hurt them to
move loads earlier.


Also at least x86 gcc normally doesn't do scheduling
beyond basic blocks, so any if () shuts it up.

I don't think any of this is a reason not to use restrict, though.
But... there are so many places we could add it to the kernel, and
probably so few where it makes much difference. Maybe it should be
able to help some critical core code, though.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... >> both less efficient and less safe than the Fortran and Basic standard. ... >> The C for loop is actually trying to do what a do loop does. ... sloppy thinking that results from confusing a programming language ... > I do not believe that you are capable of writing a conforming C compiler. ...
    (comp.programming)
  • Re: Letter to US Sen. Byron Dorgan re unpaid overtime
    ... it's a for loop in the C sense. ... > sloppy thinking that results from confusing a programming language ... >> I do not believe that you are capable of writing a conforming C compiler. ... Does Microsoft's C compiler perform this optimisation? ...
    (comp.programming)
  • Re: Histogram of character frequencies
    ... generated object code may simply be a loop in which elements are ... believe any C compiler anywhere would reject it. ... On the first iteration of the loop you test the end of file indicator ...
    (comp.lang.c)
  • Re: Fridays the thirteenth. (And a little puzzle.)
    ... -- compiler) is the usual method ... int febdays ... -- We're going to go round a loop dealing with each year in turn. ... -- other languages call) ...
    (uk.people.silversurfers)
  • Re: Interesting article by Randall Hyde
    ... >>> a test before executing the loop. ... is a compiler writer forced to ... And another thing you might consider is that compiler optimization was ... Then the programmer will have to protect the execution of that loop. ...
    (comp.lang.asm.x86)