Compiling programs with g++ from RPM 'gcc4-c++-4.1.0-18.EL4' generates code that is 4% less efficient than when the vanilla 4.1.1 g++ does at optimization level -O3. I'm guessing that RedHat turned off some of the default optimizations for -O3. Since gcc4 is a preview compiler for RHEL4.4, this is unhelpful. Using the vanilla compiler now, but it would be better if we could use the RedHat packaged gcc4. Platform is single-CPU Athlon 3500+, Venice stepping.
Actually the code is 10% slower than with GNU. Quite bad.
The gcc4 compiler in RHEL4 U4 is slightly older than g++ 4.1.1 release as the version might suggest, but there have been no optimizations turned off. When comparing two compilers, the first question must be if you are using the same options (e.g. the RHEL4 U4 gcc4 is configured to tune by default to -mtune=generic, not sure how you configured your compiler). Also, what kind of benchmarking have you used that you claim 4% or 10% less efficient code? Do you have SPEC{95,2000} numbers and a number of other industry standard benchmarks, or are you just extrapollating from one testcase where an inner loop might suffer from a single instruction selection choice, which might very well depend on the default scheduling of your g++ 4.1.1 vs. the one included in RHEL4 U4? If there is anything we can do about this, you need to provide a benchmark and make sure you use the exact same tuning and arch options (i.e. always override the default, which will be different in any case)
Ah! -mtune= makes a huge difference. GCC defaults to -mtune=k8 when you build it on an Athlon. Apparently 'k8' and 'opteron' are very similar or identical, and since the target platform is an Opteron I set -mtune=opteron. The performance is now the same with either compiler. Thanks for jogging loose the cobwebs for me on this. This bug report can be closed.