149250 – (IT_66328) Different outputs with -O2 and -O1 when -fno-automatic is used

Bug 149250 (IT_66328) - Different outputs with -O2 and -O1 when -fno-automatic is used

Summary: Different outputs with -O2 and -O1 when -fno-automatic is used

Keywords:
Status:	CLOSED ERRATA
Alias:	IT_66328
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	gcc
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	156320
TreeView+	depends on / blocked

Reported:	2005-02-21 17:33 UTC by Bastien Nocera
Modified:	2007-11-30 22:07 UTC (History)
CC List:	2 users (show)
Fixed In Version:	RHBA-2005-660
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-09-28 14:08:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
cmc004.sh (476 bytes, text/plain) 2005-02-21 17:34 UTC, Bastien Nocera	no flags	Details
fdkcovcal6.f (1.28 KB, text/plain) 2005-02-21 17:34 UTC, Bastien Nocera	no flags	Details
check.f (431 bytes, text/plain) 2005-02-21 17:35 UTC, Bastien Nocera	no flags	Details
gcc32-rh149250.patch (1.26 KB, patch) 2005-07-19 10:34 UTC, Jakub Jelinek	no flags	Details \| Diff
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2005:660	0	qe-ready	SHIPPED_LIVE	gcc bug fix update	2005-09-28 04:00:00 UTC

Description Bastien Nocera 2005-02-21 17:33:51 UTC

Description of problem:
$SUMMARY

Version-Release number of selected component (if applicable):
gcc-g77-3.2.3-49

How reproducible:
Every time

Steps to Reproduce:
1. Get all 3 attached files in this bug
2. Run "./cmc004.sh"
3. Check diff output between -O1 and -O2 settings
  
Actual results:
2,6c2,6
<   0.399976244  1.10568603 -1.5855838 -1.29522322  0.850277376 -1.5855838
<  -1.29522322  1.67283722  0.587542755 -1.33928542 -1.29522322
-0.536572933
<   0.587542755 -0.287903309 -0.750987232  0.850277376  0.990607381
-1.33928542
<  -0.750987232  0.912945271  3.02031678  1.71127798 -2.15531896
-0.49984765
<   1.87155229
---
>   0.399976244  1.10568603 -1.5855838 -1.29522322  0.850277376 
1.10568603
>   0.989358246 -1.29522322 -0.536572933  0.990607381 -1.5855838
-1.29522322
>   1.67283722  0.587542755 -1.33928542 -1.29522322 -0.536572933 
0.587542755
>  -0.287903309 -0.750987232  0.850277376  0.990607381 -1.33928542
-0.750987232
>   0.912945271


Expected results:
On RHEL4, and RHL 7.1, the results are the same

Additional info:
Remove "-fno-automatic" from the CFLAGS set in FO_1 in the "cmc004.sh"
file, and the programs have the same output under -O1 and -O2

Comment 1 Bastien Nocera 2005-02-21 17:34:26 UTC

Created attachment 111265 [details]
cmc004.sh

Compilation script

Comment 2 Bastien Nocera 2005-02-21 17:34:45 UTC

Created attachment 111266 [details]
fdkcovcal6.f

Fortan source, subroutine

Comment 3 Bastien Nocera 2005-02-21 17:35:37 UTC

Created attachment 111267 [details]
check.f

Main routine

Comment 6 Jakub Jelinek 2005-02-25 18:09:16 UTC

-O2 -fno-gcse cures this.  Will debug.

Comment 7 Jakub Jelinek 2005-02-28 17:48:30 UTC

The bug is in strength reduction, which causes the loop computing:
      DO I=1,5
      DO J=1,5
      DS=0.D0
      DO K=1,5
      DS=DS+DCOVMAT(I,K)*DFT(K,J)
      ENDDO
      DF(I,J)=DS
      ENDDO
      ENDDO
to read DFT array members as if J in the second and following iteration was
one bigger than it actually should be.  So first iteration reads DFT(,1) but
second DFT(,3) and third DFT(,4).

As a workaround, -fno-automatic -fno-strength-reduce can be used.

Comment 16 Jakub Jelinek 2005-07-18 14:41:21 UTC

Simplified testcase:

C Works with:
C -O1 -m32 -mcpu=i486 -fno-automatic
C -O2 -m32 -mcpu=i486
C -O2 -m32 -mcpu=i486 -fno-automatic -fno-strength-reduce
C Fails with:
C -O2 -m32 -mcpu=i486 -fno-automatic
        SUBROUTINE FOO(D)
        REAL*8 A,B,C,D,E
        DIMENSION A(5,5),B(5,5),C(5,5),D(5,5)
        DO I=1,5
          DO J=1,5
            A(I,J)=J
            B(I,J)=1
          ENDDO
        ENDDO
        DO I=1,5
          DO J=1,5
            E=0.D0
            DO K=1,5
              E=E+B(I,K)*A(K,J)
            ENDDO
            C(I,J)=E
          ENDDO
        ENDDO
        DO I=1,5
          DO J=1,5
            D(I,J)=C(I,J)
          ENDDO
        ENDDO
        END

        REAL*8 D
        DIMENSION D(5,5)
        CALL FOO(D)
        DO I=1,5
          DO J=1,5
            IF (D(I,J).NE.5*J) CALL ABORT
          ENDDO
        ENDDO
        END

Comment 17 Jakub Jelinek 2005-07-19 10:32:31 UTC

Looking at loop_givs_reduce changes in GCC 3.3 I found
http://gcc.gnu.org/ml/gcc-patches/2002-09/msg00045.html
that indeed fixes this testcase.
In pseudo patch, the change that loop_givs_reduce was doing in this case when
reducing GIV 136, creating new pseudo 185, is:
 (note 134 484 298 NOTE_INSN_LOOP_BEG)
 (code_label 298 134 436 26 "" "" [1 uses])
 ...
 (insn 587 150 152 (set (reg:SI 136) (const_int 1 [0x1])) 45 {*movsi_1} (nil))
 ...
+(insn 876 853 881 (set (reg:SI 185) (const_int 2 [0x2])) -1 (nil) (nil))
 ...
 (note 153 492 279 NOTE_INSN_LOOP_BEG)
 (code_label 279 153 437 25 "" "" [1 uses])
 ...
 (insn 730 729 731 (parallel[(set (reg:SI 92) (ashift:SI (reg:SI 136) (const_int
2 [0x2]))) (clobber (reg:CC 17 flags))]))
 (insn 731 730 734 (parallel[(set (reg:SI 93) (plus:SI (reg:SI 92) (reg:SI 136)))
(clobber (reg:CC 17 flags))]) -1 (nil)
     (expr_list:REG_EQUAL (mult:SI (reg:SI 136) (const_int 5 [0x5])) (nil)))
 ...
 (note 177 744 240 NOTE_INSN_LOOP_BEG)
 (code_label 240 177 438 24 "" "" [1 uses])
 ...
 (note 230 228 237 NOTE_INSN_LOOP_CONT)
 ...
 (note 506 239 180 NOTE_INSN_LOOP_VTOP)
 ...
 (note 245 182 738 NOTE_INSN_LOOP_END)
 ...
 (insn 256 443 258 (parallel[(set (reg:SI 106) (ashift:SI (reg:SI 136) (const_int
2 [0x2]))) (clobber (reg:CC 17 flags))]))
 (insn 258 256 262 (parallel[(set (reg:SI 107) (plus:SI (reg:SI 106) (reg:SI
136))) (clobber (reg:CC 17 flags))])
     207 {*addsi_1} (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 136) (const_int 5
[0x5])) (nil)))
 ...
 (note 269 267 276 NOTE_INSN_LOOP_CONT)
 (insn 276 269 584 (parallel[(set (reg:SI 113) (plus:SI (reg:SI 136) (const_int 1
[0x1]))) (clobber (reg:CC 17 flags))]))
+(insn 873 276 879 (parallel[(set (reg:SI 185) (plus:SI (reg:SI 185) (const_int 1
[0x1]))) (clobber (reg:CC 17 flags))]))
-(insn 584 276 278 (set (reg:SI 136) (reg:SI 113)) 45 {*movsi_1} (nil) (nil))
+(insn 584 914 278 (set (reg:SI 136) (reg:SI 185)) -1 (nil) (nil))
 ...
 (note 498 278 156 NOTE_INSN_LOOP_VTOP)
 ...
 (note 284 158 286 NOTE_INSN_LOOP_END)
 ...
 (note 288 286 449 NOTE_INSN_LOOP_CONT)
 ...
 (note 490 297 137 NOTE_INSN_LOOP_VTOP)
 ...
 (note 303 139 305 NOTE_INSN_LOOP_END)

Here, tv->insn is insn 584.  Without Jan's patch the new increment insn (873)
is inserted before that instruction, and as the initial value for pseudo 185
is 2, not 1 (pseudo 136's initial value (1) * mult_val (1) + add_val (1)), it
means pseudo 136 has value 1 in the first iteration, but 3 in the second, 4
in the third etc.

Comment 18 Jakub Jelinek 2005-07-19 10:34:08 UTC

Created attachment 116917 [details]
gcc32-rh149250.patch

Comment 24 Red Hat Bugzilla 2005-09-28 14:08:19 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2005-660.html

Note You need to log in before you can comment on or make changes to this bug.