| Summary: | gcc4.4.5 in rhel6.1 generates slower code than gcc4.1.2 in rhel5.6 on power6 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Adam Okuliar <aokuliar> | ||||
| Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-tools-bugs | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 6.1 | ||||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | ppc | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-03-10 15:01:07 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Adam Okuliar
2011-03-10 12:25:25 UTC
Created attachment 483428 [details]
source, compiled binary, and test outputs
Seems lfdx/stfdx are horribly slow on power6.
It is something for IBM to figure out, current GCC 4.6 behaves exactly the same.
Small testcase:
#define N 10000000
double a[N], b[N], c[N];
__attribute__((noinline))
void foo ()
{
int j;
for (j=0; j<N; j++)
c[j] = a[j];
}
int
main ()
{
int i;
for (i = 0; i < 50; i++)
foo ();
return 0;
}
compile with -m32 -O2 -mtune=power6 or -m32 -O3 -mtune=power6.
With gcc 4.1.2 the inner loop in foo is:
.L2:
lfd 0,0(9)
addi 9,9,8
stfd 0,0(11)
addi 11,11,8
bdnz .L2
while with 4.4-RH as well as current 4.6 trunk:
.L2:
lfdx 0,11,9
stfdx 0,10,9
addi 9,9,8
bdnz .L2
time ./test-4.1
real 0m1.118s
user 0m1.090s
sys 0m0.026s
time ./test-4.4
real 0m5.395s
user 0m5.348s
sys 0m0.038s
Ok, apparently if you compile with -mcpu=power6 (i.e. both tune for power6 (that's the default in RHEL6) and stop supporting older CPUs), then -mavoid-indexed-addresses is used by default and this penalty on power6 is no longer present. Or you can explicitly compile with -mavoid-indexed-addresses. |