Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 683808

Summary: gcc4.4.5 in rhel6.1 generates slower code than gcc4.1.2 in rhel5.6 on power6
Product: Red Hat Enterprise Linux 6 Reporter: Adam Okuliar <aokuliar>
Component: gccAssignee: Jakub Jelinek <jakub>
Status: CLOSED NOTABUG QA Contact: qe-baseos-tools-bugs
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.1   
Target Milestone: rc   
Target Release: ---   
Hardware: ppc   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-10 15:01:07 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
source, compiled binary, and test outputs none

Description Adam Okuliar 2011-03-10 12:25:25 UTC
Description of problem:
gcc4.4.5 in rhel6.1 generates slower code than gcc4.1.2 in rhel5.6

Version-Release number of selected component (if applicable):
4.4.5 20110214 in rhel6.1 and 

How reproducible:
100%

Steps to Reproduce:
1.download source code (attached) of stream benchmark
2.compile it with gcc -02 stream.c -o stream on rhel5.6 ,6.0 and 6.1
3.run stream benchmark. ./stream > stream_out
5.compare results on same machine.
  
Actual results:
RHEL5.6:
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        5351.7121       0.0308       0.0299       0.0317
Scale:       5398.6392       0.0306       0.0296       0.0313
Add:         6120.4283       0.0400       0.0392       0.0407
Triad:       6155.9105       0.0398       0.0390       0.0405
ALL:         5798.0225       0.1412       0.1380       0.1439

RHEL6.0
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1402.1940       0.1143       0.1141       0.1145
Scale:       1458.0452       0.1098       0.1097       0.1101
Add:         1481.1158       0.1622       0.1620       0.1624
Triad:       1460.0755       0.1645       0.1644       0.1647
ALL:         1453.2925       0.5507       0.5505       0.5515

RHEL6.1
Function      Rate (MB/s)   Avg time     Min time     Max time
Copy:        1399.7752       0.1144       0.1143       0.1145
Scale:       1456.9816       0.1099       0.1098       0.1102
Add:         1479.1440       0.1624       0.1623       0.1626
Triad:       1458.4001       0.1647       0.1646       0.1648
ALL:         1451.7734       0.5514       0.5511       0.5522

code generated on rhel6.x has only ~25% performance of code generated on 5.6.


Expected results:
performance of generated code improved to rhel5.6 levels

Additional info:
Attachment contains:
stream.c,stream.h - source code of stream benchmark
stream56,stream60,stream61 - binary compiled on rhel5.6,6.0,6.1
stream56_out, stream60_out, stream61_out - outputs of single run of benchmark

performance tested on 
ibm-js22-vios-02-lp1.rhts.eng.bos.redhat.com

Comment 1 Adam Okuliar 2011-03-10 12:28:28 UTC
Created attachment 483428 [details]
source, compiled binary, and test outputs

Comment 3 Jakub Jelinek 2011-03-10 14:31:49 UTC
Seems lfdx/stfdx are horribly slow on power6.
It is something for IBM to figure out, current GCC 4.6 behaves exactly the same.

Small testcase:

#define N 10000000
double a[N], b[N], c[N];

__attribute__((noinline))
void foo ()
{
  int j;
  for (j=0; j<N; j++)
    c[j] = a[j];
}

int
main ()
{
  int i;
  for (i = 0; i < 50; i++)
    foo ();
  return 0;
}

compile with -m32 -O2 -mtune=power6 or -m32 -O3 -mtune=power6.

With gcc 4.1.2 the inner loop in foo is:
.L2:
        lfd 0,0(9)
        addi 9,9,8
        stfd 0,0(11)
        addi 11,11,8
        bdnz .L2
while with 4.4-RH as well as current 4.6 trunk:
.L2:
        lfdx 0,11,9
        stfdx 0,10,9
        addi 9,9,8
        bdnz .L2

time ./test-4.1

real	0m1.118s
user	0m1.090s
sys	0m0.026s
time ./test-4.4

real	0m5.395s
user	0m5.348s
sys	0m0.038s

Comment 4 Jakub Jelinek 2011-03-10 15:01:07 UTC
Ok, apparently if you compile with -mcpu=power6 (i.e. both tune for power6 (that's the default in RHEL6) and stop supporting older CPUs), then
-mavoid-indexed-addresses is used by default and this penalty on power6 is no longer present.  Or you can explicitly compile with -mavoid-indexed-addresses.