Description of problem: $SUMMARY Version-Release number of selected component (if applicable): gcc-g77-3.2.3-49 How reproducible: Every time Steps to Reproduce: 1. Get all 3 attached files in this bug 2. Run "./cmc004.sh" 3. Check diff output between -O1 and -O2 settings Actual results: 2,6c2,6 < 0.399976244 1.10568603 -1.5855838 -1.29522322 0.850277376 -1.5855838 < -1.29522322 1.67283722 0.587542755 -1.33928542 -1.29522322 -0.536572933 < 0.587542755 -0.287903309 -0.750987232 0.850277376 0.990607381 -1.33928542 < -0.750987232 0.912945271 3.02031678 1.71127798 -2.15531896 -0.49984765 < 1.87155229 --- > 0.399976244 1.10568603 -1.5855838 -1.29522322 0.850277376 1.10568603 > 0.989358246 -1.29522322 -0.536572933 0.990607381 -1.5855838 -1.29522322 > 1.67283722 0.587542755 -1.33928542 -1.29522322 -0.536572933 0.587542755 > -0.287903309 -0.750987232 0.850277376 0.990607381 -1.33928542 -0.750987232 > 0.912945271 Expected results: On RHEL4, and RHL 7.1, the results are the same Additional info: Remove "-fno-automatic" from the CFLAGS set in FO_1 in the "cmc004.sh" file, and the programs have the same output under -O1 and -O2
Created attachment 111265 [details] cmc004.sh Compilation script
Created attachment 111266 [details] fdkcovcal6.f Fortan source, subroutine
Created attachment 111267 [details] check.f Main routine
-O2 -fno-gcse cures this. Will debug.
The bug is in strength reduction, which causes the loop computing: DO I=1,5 DO J=1,5 DS=0.D0 DO K=1,5 DS=DS+DCOVMAT(I,K)*DFT(K,J) ENDDO DF(I,J)=DS ENDDO ENDDO to read DFT array members as if J in the second and following iteration was one bigger than it actually should be. So first iteration reads DFT(,1) but second DFT(,3) and third DFT(,4). As a workaround, -fno-automatic -fno-strength-reduce can be used.
Simplified testcase: C Works with: C -O1 -m32 -mcpu=i486 -fno-automatic C -O2 -m32 -mcpu=i486 C -O2 -m32 -mcpu=i486 -fno-automatic -fno-strength-reduce C Fails with: C -O2 -m32 -mcpu=i486 -fno-automatic SUBROUTINE FOO(D) REAL*8 A,B,C,D,E DIMENSION A(5,5),B(5,5),C(5,5),D(5,5) DO I=1,5 DO J=1,5 A(I,J)=J B(I,J)=1 ENDDO ENDDO DO I=1,5 DO J=1,5 E=0.D0 DO K=1,5 E=E+B(I,K)*A(K,J) ENDDO C(I,J)=E ENDDO ENDDO DO I=1,5 DO J=1,5 D(I,J)=C(I,J) ENDDO ENDDO END REAL*8 D DIMENSION D(5,5) CALL FOO(D) DO I=1,5 DO J=1,5 IF (D(I,J).NE.5*J) CALL ABORT ENDDO ENDDO END
Looking at loop_givs_reduce changes in GCC 3.3 I found http://gcc.gnu.org/ml/gcc-patches/2002-09/msg00045.html that indeed fixes this testcase. In pseudo patch, the change that loop_givs_reduce was doing in this case when reducing GIV 136, creating new pseudo 185, is: (note 134 484 298 NOTE_INSN_LOOP_BEG) (code_label 298 134 436 26 "" "" [1 uses]) ... (insn 587 150 152 (set (reg:SI 136) (const_int 1 [0x1])) 45 {*movsi_1} (nil)) ... +(insn 876 853 881 (set (reg:SI 185) (const_int 2 [0x2])) -1 (nil) (nil)) ... (note 153 492 279 NOTE_INSN_LOOP_BEG) (code_label 279 153 437 25 "" "" [1 uses]) ... (insn 730 729 731 (parallel[(set (reg:SI 92) (ashift:SI (reg:SI 136) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags))])) (insn 731 730 734 (parallel[(set (reg:SI 93) (plus:SI (reg:SI 92) (reg:SI 136))) (clobber (reg:CC 17 flags))]) -1 (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 136) (const_int 5 [0x5])) (nil))) ... (note 177 744 240 NOTE_INSN_LOOP_BEG) (code_label 240 177 438 24 "" "" [1 uses]) ... (note 230 228 237 NOTE_INSN_LOOP_CONT) ... (note 506 239 180 NOTE_INSN_LOOP_VTOP) ... (note 245 182 738 NOTE_INSN_LOOP_END) ... (insn 256 443 258 (parallel[(set (reg:SI 106) (ashift:SI (reg:SI 136) (const_int 2 [0x2]))) (clobber (reg:CC 17 flags))])) (insn 258 256 262 (parallel[(set (reg:SI 107) (plus:SI (reg:SI 106) (reg:SI 136))) (clobber (reg:CC 17 flags))]) 207 {*addsi_1} (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 136) (const_int 5 [0x5])) (nil))) ... (note 269 267 276 NOTE_INSN_LOOP_CONT) (insn 276 269 584 (parallel[(set (reg:SI 113) (plus:SI (reg:SI 136) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags))])) +(insn 873 276 879 (parallel[(set (reg:SI 185) (plus:SI (reg:SI 185) (const_int 1 [0x1]))) (clobber (reg:CC 17 flags))])) -(insn 584 276 278 (set (reg:SI 136) (reg:SI 113)) 45 {*movsi_1} (nil) (nil)) +(insn 584 914 278 (set (reg:SI 136) (reg:SI 185)) -1 (nil) (nil)) ... (note 498 278 156 NOTE_INSN_LOOP_VTOP) ... (note 284 158 286 NOTE_INSN_LOOP_END) ... (note 288 286 449 NOTE_INSN_LOOP_CONT) ... (note 490 297 137 NOTE_INSN_LOOP_VTOP) ... (note 303 139 305 NOTE_INSN_LOOP_END) Here, tv->insn is insn 584. Without Jan's patch the new increment insn (873) is inserted before that instruction, and as the initial value for pseudo 185 is 2, not 1 (pseudo 136's initial value (1) * mult_val (1) + add_val (1)), it means pseudo 136 has value 1 in the first iteration, but 3 in the second, 4 in the third etc.
Created attachment 116917 [details] gcc32-rh149250.patch
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2005-660.html