205058 – g++ 4.1.0 -O3 optimization produces slower code than vanilla g++ 4.1.1

Bug 205058 - g++ 4.1.0 -O3 optimization produces slower code than vanilla g++ 4.1.1

Summary: g++ 4.1.0 -O3 optimization produces slower code than vanilla g++ 4.1.1

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	gcc4
Sub Component:
Version:	4.4
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Jakub Jelinek
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-09-03 01:18 UTC by starlight
Modified:	2008-08-02 23:40 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2006-09-03 17:58:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description starlight 2006-09-03 01:18:32 UTC

Compiling programs with g++ from RPM 'gcc4-c++-4.1.0-18.EL4' 
generates code that is 4% less efficient than when the vanilla 
4.1.1 g++ does at optimization level -O3.

I'm guessing that RedHat turned off some of the default 
optimizations for -O3.  Since gcc4 is a preview compiler for 
RHEL4.4, this is unhelpful.  Using the vanilla compiler now, 
but it would be better if we could use the RedHat packaged 
gcc4.

Platform is single-CPU Athlon 3500+, Venice stepping.

Comment 1 starlight 2006-09-03 01:20:44 UTC

Actually the code is 10% slower than with GNU.  Quite bad.

Comment 2 Jakub Jelinek 2006-09-03 06:51:22 UTC

The gcc4 compiler in RHEL4 U4 is slightly older than g++ 4.1.1 release as
the version might suggest, but there have been no optimizations turned
off.  When comparing two compilers, the first question must be if you are
using the same options (e.g. the RHEL4 U4 gcc4 is configured to tune
by default to -mtune=generic, not sure how you configured your compiler).
Also, what kind of benchmarking have you used that you claim 4% or 10% less
efficient code?  Do you have SPEC{95,2000} numbers and a number of other
industry standard benchmarks, or are you just extrapollating from one testcase
where an inner loop might suffer from a single instruction selection choice,
which might very well depend on the default scheduling of your g++ 4.1.1 vs.
the one included in RHEL4 U4?
If there is anything we can do about this, you need to provide a benchmark and
make sure you use the exact same tuning and arch options (i.e. always override
the default, which will be different in any case)

Comment 3 starlight 2006-09-03 16:54:14 UTC

Ah!  -mtune= makes a huge difference.

GCC defaults to -mtune=k8 when you build it on an Athlon.
Apparently 'k8' and 'opteron' are very similar or identical,
and since the target platform is an Opteron I set -mtune=opteron.
The performance is now the same with either compiler.

Thanks for jogging loose the cobwebs for me on this.
This bug report can be closed.

Note You need to log in before you can comment on or make changes to this bug.