Bug 149250 (IT_66328)
Summary: | Different outputs with -O2 and -O1 when -fno-automatic is used | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 3 | Reporter: | Bastien Nocera <bnocera> | ||||||||||
Component: | gcc | Assignee: | Jakub Jelinek <jakub> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | |||||||||||
Severity: | medium | Docs Contact: | |||||||||||
Priority: | medium | ||||||||||||
Version: | 3.0 | CC: | rth, tao | ||||||||||
Target Milestone: | --- | ||||||||||||
Target Release: | --- | ||||||||||||
Hardware: | All | ||||||||||||
OS: | Linux | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | RHBA-2005-660 | Doc Type: | Bug Fix | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2005-09-28 14:08:19 UTC | Type: | --- | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Bug Depends On: | |||||||||||||
Bug Blocks: | 156320 | ||||||||||||
Attachments: |
|
Description
Bastien Nocera
2005-02-21 17:33:51 UTC
Created attachment 111265 [details]
cmc004.sh
Compilation script
Created attachment 111266 [details]
fdkcovcal6.f
Fortan source, subroutine
Created attachment 111267 [details]
check.f
Main routine
-O2 -fno-gcse cures this. Will debug. The bug is in strength reduction, which causes the loop computing: DO I=1,5 DO J=1,5 DS=0.D0 DO K=1,5 DS=DS+DCOVMAT(I,K)*DFT(K,J) ENDDO DF(I,J)=DS ENDDO ENDDO to read DFT array members as if J in the second and following iteration was one bigger than it actually should be. So first iteration reads DFT(,1) but second DFT(,3) and third DFT(,4). As a workaround, -fno-automatic -fno-strength-reduce can be used. Simplified testcase: C Works with: C -O1 -m32 -mcpu=i486 -fno-automatic C -O2 -m32 -mcpu=i486 C -O2 -m32 -mcpu=i486 -fno-automatic -fno-strength-reduce C Fails with: C -O2 -m32 -mcpu=i486 -fno-automatic SUBROUTINE FOO(D) REAL*8 A,B,C,D,E DIMENSION A(5,5),B(5,5),C(5,5),D(5,5) DO I=1,5 DO J=1,5 A(I,J)=J B(I,J)=1 ENDDO ENDDO DO I=1,5 DO J=1,5 E=0.D0 DO K=1,5 E=E+B(I,K)*A(K,J) ENDDO C(I,J)=E ENDDO ENDDO DO I=1,5 DO J=1,5 D(I,J)=C(I,J) ENDDO ENDDO END REAL*8 D DIMENSION D(5,5) CALL FOO(D) DO I=1,5 DO J=1,5 IF (D(I,J).NE.5*J) CALL ABORT ENDDO ENDDO END Looking at loop_givs_reduce changes in GCC 3.3 I found http://gcc.gnu.org/ml/gcc-patches/2002-09/msg00045.html that indeed fixes this testcase. In pseudo patch, the change that loop_givs_reduce was doing in this case when reducing GIV 136, creating new pseudo 185, is: (note 134 484 298 NOTE_INSN_LOOP_BEG) (code_label 298 134 436 26 "" "" [1 uses]) ... (insn 587 150 152 (set (reg:SI 136) (const_int 1 [0x1])) 45 {*movsi_1} (nil)) ... +(insn 876 853 881 (set (reg:SI 185) (const_int 2 [0x2])) -1 (nil) (nil)) ... (note 153 492 279 NOTE_INSN_LOOP_BEG) (code_label 279 153 437 25 "" "" [1 uses]) ... (insn 730 729 731 (parallel[(set (reg:SI 92) (ashift:SI (reg:SI 136) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags))])) (insn 731 730 734 (parallel[(set (reg:SI 93) (plus:SI (reg:SI 92) (reg:SI 136))) (clobber (reg:CC 17 flags))]) -1 (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 136) (const_int 5 [0x5])) (nil))) ... (note 177 744 240 NOTE_INSN_LOOP_BEG) (code_label 240 177 438 24 "" "" [1 uses]) ... (note 230 228 237 NOTE_INSN_LOOP_CONT) ... (note 506 239 180 NOTE_INSN_LOOP_VTOP) ... (note 245 182 738 NOTE_INSN_LOOP_END) ... (insn 256 443 258 (parallel[(set (reg:SI 106) (ashift:SI (reg:SI 136) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags))])) (insn 258 256 262 (parallel[(set (reg:SI 107) (plus:SI (reg:SI 106) (reg:SI 136))) (clobber (reg:CC 17 flags))]) 207 {*addsi_1} (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 136) (const_int 5 [0x5])) (nil))) ... (note 269 267 276 NOTE_INSN_LOOP_CONT) (insn 276 269 584 (parallel[(set (reg:SI 113) (plus:SI (reg:SI 136) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags))])) +(insn 873 276 879 (parallel[(set (reg:SI 185) (plus:SI (reg:SI 185) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags))])) -(insn 584 276 278 (set (reg:SI 136) (reg:SI 113)) 45 {*movsi_1} (nil) (nil)) +(insn 584 914 278 (set (reg:SI 136) (reg:SI 185)) -1 (nil) (nil)) ... (note 498 278 156 NOTE_INSN_LOOP_VTOP) ... (note 284 158 286 NOTE_INSN_LOOP_END) ... (note 288 286 449 NOTE_INSN_LOOP_CONT) ... (note 490 297 137 NOTE_INSN_LOOP_VTOP) ... (note 303 139 305 NOTE_INSN_LOOP_END) Here, tv->insn is insn 584. Without Jan's patch the new increment insn (873) is inserted before that instruction, and as the initial value for pseudo 185 is 2, not 1 (pseudo 136's initial value (1) * mult_val (1) + add_val (1)), it means pseudo 136 has value 1 in the first iteration, but 3 in the second, 4 in the third etc. Created attachment 116917 [details]
gcc32-rh149250.patch
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-660.html |