Bug 205058 - g++ 4.1.0 -O3 optimization produces slower code than vanilla g++ 4.1.1
Summary: g++ 4.1.0 -O3 optimization produces slower code than vanilla g++ 4.1.1
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: gcc4 (Show other bugs)
(Show other bugs)
Version: 4.4
Hardware: x86_64 Linux
Target Milestone: ---
: ---
Assignee: Jakub Jelinek
QA Contact:
Depends On:
TreeView+ depends on / blocked
Reported: 2006-09-03 01:18 UTC by starlight
Modified: 2008-08-02 23:40 UTC (History)
0 users

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2006-09-03 17:58:14 UTC
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

Description starlight 2006-09-03 01:18:32 UTC
Compiling programs with g++ from RPM 'gcc4-c++-4.1.0-18.EL4' 
generates code that is 4% less efficient than when the vanilla 
4.1.1 g++ does at optimization level -O3.

I'm guessing that RedHat turned off some of the default 
optimizations for -O3.  Since gcc4 is a preview compiler for 
RHEL4.4, this is unhelpful.  Using the vanilla compiler now, 
but it would be better if we could use the RedHat packaged 

Platform is single-CPU Athlon 3500+, Venice stepping.

Comment 1 starlight 2006-09-03 01:20:44 UTC
Actually the code is 10% slower than with GNU.  Quite bad.

Comment 2 Jakub Jelinek 2006-09-03 06:51:22 UTC
The gcc4 compiler in RHEL4 U4 is slightly older than g++ 4.1.1 release as
the version might suggest, but there have been no optimizations turned
off.  When comparing two compilers, the first question must be if you are
using the same options (e.g. the RHEL4 U4 gcc4 is configured to tune
by default to -mtune=generic, not sure how you configured your compiler).
Also, what kind of benchmarking have you used that you claim 4% or 10% less
efficient code?  Do you have SPEC{95,2000} numbers and a number of other
industry standard benchmarks, or are you just extrapollating from one testcase
where an inner loop might suffer from a single instruction selection choice,
which might very well depend on the default scheduling of your g++ 4.1.1 vs.
the one included in RHEL4 U4?
If there is anything we can do about this, you need to provide a benchmark and
make sure you use the exact same tuning and arch options (i.e. always override
the default, which will be different in any case)

Comment 3 starlight 2006-09-03 16:54:14 UTC
Ah!  -mtune= makes a huge difference.

GCC defaults to -mtune=k8 when you build it on an Athlon.
Apparently 'k8' and 'opteron' are very similar or identical,
and since the target platform is an Opteron I set -mtune=opteron.
The performance is now the same with either compiler.

Thanks for jogging loose the cobwebs for me on this.
This bug report can be closed.

Note You need to log in before you can comment on or make changes to this bug.