Description of Problem:
GCC apparently disregards the programmer use of the "inline" keyword when the
function is larger than a certain threshold, potentially causing very poor code
Furthermore it appears to do so based on the size of the function rather than
the size of the optimized code that would be generated by inlining it, which
makes the problem much worse.
In particular some functions (e.g. Linux memcpy and friends) check whether
parameters are constant and if so do a long list of checks on the parameters,
sure that they will be optimized away: GCC screws this up horribly and
transforms a clever programmer optimization in a huge slowdown.
There also functions that are very long and cannot be splitted in separate
"physical" functions for performance reasons, but are instead splitted in, maybe
very long, inline functions that are used once: GCC also screws this up horribly.
IMHO this "feature" should be removed, and GCC should always honor the
programmer's choice, which he must make correctly anyway since other compilers
might not correct it.
If that is not possible, the limit should at least be increased.
The problem can be worked around using this code:
#if (__GNUC__ > 3) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 2))
#if !defined(__cplusplus) && defined(__OPTIMIZE__)
#define inline inline __attribute__((always_inline))
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Compile the attached code with "-O2 -S" for x86code; then add the
"-finline-limit=1000000" option and compare the two assembly listings.
1. Compile Linux for x86 with no special inline limit settings and do an `nm
vmlinux|grep __constant`. These functions should have been inlined.
Created attachment 89396 [details]
Inline heuristics rewritten for gcc 3.4. Things are significantly
better there. Debugging output (-dU) shows
Considering generic_fls with 38 insns
Estimated growth is -10 insns.
Inlined into test which now has 51 insns.
Inlined 1 times for a net change of -10 insns.