Re: [patch] measurements, numbers about CONFIG_OPTIMIZE_INLINING=y impact





On Fri, 9 Jan 2009, Nicholas Miell wrote:

So take your complaint about gcc's decision to inline functions called
once.

Actually, the "called once" really is a red herring. The big complaint is
"too aggressively when not asked for". It just so happens that the called
once logic is right now the main culprit.

Ignore for the moment the separate issue of stack growth and let's
talk about what it does to debugging, which was the bulk of your
complaint that I originally responded to.

Actually, stack growth is the one that ends up being a correctness issue.
But:

In the general case is it does nothing at all to debugging (beyond the
usual weird control flow you get from any optimized code) -- the
compiler generates line number information for the inlined functions,
the debugger interprets that information, and your backtrace is
accurate.

The thng is, we do not use line number information, and never will -
because it's too big. MUCH too big.

We do end up saving function start information (although even that is
actually disabled if you're doing embedded development), so that we can at
least tell which function something happened in.

It is only in the specific case of the kernel's broken backtrace code
that this becomes an issue. It's failure to function correctly is the
direct result of a failure to keep up with modern compiler changes that
everybody else in the toolchain has dealt with.

Umm. You can say that. But the fact is, most others care a whole lot
_less_ about those "modern compiler changes". In user space, when you
debug something, you generally just stop optimizing. In the kernel, we've
tried to balance the "optimize vs debug info" thing.

I think that the answer to that is that the kernel should do its best to
be as much like userspace apps as it can, because insisting on special
treatment doesn't seem to be working.

The problem with that is that the kernel _isn't_ a normal app. An it
_definitely_ isn't a normal app when it comes to debugging.

You can hand-wave and talk about it all you want, but it's just not going
to happen. A kernel is special. We don't get dumps, and only crazy people
even ask for them.

The fact that you seem to think that we should get them just shows that
you either don't udnerstand the problems, or you live in some sheltered
environment wher crash-dumps _could_ work, but also by definition those
environments aren't where they buy kernel developers anything.

The thing is, a crash dump in a "enterprise environment" (and that is the
only kind where you can reasonably dump more than the minimal stuff we do
now) is totally useless - because such kernels are usually at least a year
old, often more. As such, debug information from enterprise users is
almost totally worthless - if we relied on it, we'd never get anything
done.

And outside of those kinds of very rare niches, big kernel dumps simply
are not an option. Writing to disk when things go hay-wire in the kernel
is the _last_ thing you must ever do. People can't have dedicated dump
partitions or network dumps.

That's the reality. I'm not making it up. We can give a simple trace, and
yes, we can try to do some off-line improvement on it (and kerneloops.org
to some degree does), but that's just about it.

But debugging isn't even the only issue. It's just that debuggability is
more important than a DUBIOUS improvement in code quality. See? Note the
DUBIOUS.

Let's take a very practical example on a number that has been floated
around here: letting gcc do inlining decisions apparently can help for up
to about 4% of code-size. Fair enough - I happen to believe that we could
cut that down a bit by just doing things manually with a checker, but
that's neither here nor there.

What's the cost/benefit of that 4%? Does it actually improve performance?
Especially if you then want to keep DWARF unwind information in memory in
order to fix up some of the problems it causes? At that point, you lost
all the memory you won, and then some.

Does it help I$ utilization (which can speed things up a lot more, and is
probably the main reason -Os actually tends to perform better)? Likely
not. Sure, shrinking code is good for I$, but on the other hand inlining
can actually be bad for I$ density because if you inline a function that
doesn't get called, you now fragmented your footprint a lot more.

So aggressively inlining has to be shown to be a real _win_.

You try to say "well, do better debug info", but that turns inlining into
a _loss_, so then the proper response is "don't inline".

So when is inlining a win?

It's a win when the thing you inline is clearly not bigger than the call
site. Then it's totally unambiguous.

It's also often a win if it's a unconditional call from a single site, and
you only inline one such, so that you avoid all of the downsides (you may
be able to _shrink_ stack usage, and you're hopefully making I$ accesses
_denser_ rather than fragmenting it).

And if you can seriously simplify the code by taking advantage of constant
arguments, it can be an absolutely _huge_ win. Except as we've seen in
this discussion, gcc currently doesn't apparently even consider this case
before it does the inlining decision.

But if we're just looking at code-size, then no, it's _not_ a win. Code
size can be a win (4% denser I$ is good), but a lot of the cases I've seen
(which is often the _bad_ cases, since I end up looking at them because we
are chasing bugs due to things like stack usage), it's actually just
fragmenting the function and making everybody lose.

Oh, and yes, it does depend on architectures. Some architectures suck at
function calls. That's why being able to trust the compiler _would_ be a
good thing, no question about that. But yes, we do need to be able to
trust it to make sense.

Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/



Relevant Pages

  • Re: [Question] about modules/inline benefits
    ... benefits/drawbacks there are for modules compared to inlining. ... core kernel image. ... Some features are only available as modules. ... Building things as modules versus building them inline can sometimes expose ...
    (Linux-Kernel)
  • Re: [patch] measurements, numbers about CONFIG_OPTIMIZE_INLINING=y impact
    ... One interpretation of the numbers would be that core kernel hackers are ... I think people over-use inline for the opposite reason. ... Note that i talked about the core kernel specifically. ... They also know inlining may increase program object size. ...
    (Linux-Kernel)
  • Re: [PATCH] prepare kconfig inline optimization for all architectures
    ... It would not inline big functions even when they statically collapsed ... we really couldn't afford to let gcc make any inlining ... I'm looking at it from a different angle, all code in the kernel should ... There had been need of rain for many days. ...
    (Linux-Kernel)
  • Re: [PATCH] prepare kconfig inline optimization for all architectures
    ... I'm looking at it from a different angle, all code in the kernel should ... no functions in .c files should be marked inline ... Modern versions of gcc may do the right thing. ... utter crap when it comes to inlining ...
    (Linux-Kernel)
  • Re: [PATCH 1/4] tracing: move __DO_TRACE out of line
    ... Dormant tracepoints, when sprinkled all over the place, have a very small, but ... The only mechanism I can think of is that, because the inline code sections are smaller, gcc is less inclined to put the ifcode out of line, so the amount of hot-patch code is higher. ... out-of-line, no immediate value optimization ... I think there's something fundimentally off about about this kind of kernel benchmark methodology. ...
    (Linux-Kernel)