175793 – problem with -O2, but not with -O1

Bug 175793 - problem with -O2, but not with -O1

Summary: problem with -O2, but not with -O1

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Fedora
Classification:	Fedora
Component:	gcc
Sub Component:
Version:	rawhide
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Jakub Jelinek
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2005-12-15 00:55 UTC by John Ellson
Modified:	2007-11-30 22:11 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2005-12-16 12:51:49 UTC
Type:	---
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
C source of the problem function (not complete enough to be compiled alone) (1.33 KB, text/plain) 2005-12-15 00:55 UTC, John Ellson	no flags	Details
gdb disassembly of mkspline() when compiled with -O2 (14.98 KB, text/plain) 2005-12-15 00:56 UTC, John Ellson	no flags	Details
gdb disassembly of mkspline() when compiled with -O1 (14.62 KB, text/plain) 2005-12-15 00:58 UTC, John Ellson	no flags	Details
testcase (3.57 KB, text/plain) 2005-12-16 02:13 UTC, John Ellson	no flags	Details
Makefile for testcase (97 bytes, text/plain) 2005-12-16 02:14 UTC, John Ellson	no flags	Details
View All

Description John Ellson 2005-12-15 00:55:34 UTC

Description of problem:
I have a piece of code that has worked for perhaps five years with previous
versions of gcc and with other C compilers, and which now fails with gcc -O2,
but not with -O1

The program dies with: SIGFPE, Arithmetic exception.

The cause of the SIGFPE is an attempt to divide by a NAN.  The problem is why
the NAN occurred.

The NAN occurs, only with -O2, from the subtraction in the statement:
    det01 = c[0][0] * c[1][1] - c[1][0] * c[0][1];
with the four c[][] values close to, but apparently not exactly zero.

Version-Release number of selected component (if applicable):
gcc-4.1.0-0.7

How reproducible:
100% (The code is in lib/pathplan/route.c in graphviz, which is in Fedora Extras)

Steps to Reproduce:
1. make with gcc -O2
2.
3.
  
Actual results:
ellson@ontap:dot> gdb ./dot_static
GNU gdb Red Hat Linux (6.3.0.0-1.81rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db l
ibrary "/lib64/libthread_db.so.1".

(gdb) run hello.dot
Starting program: /home/ellson/FIX/Linux.x86_64/build/graphviz2/cmd/dot/dot_stat
ic hello.dot

Program received signal SIGFPE, Arithmetic exception.
0x000000000047b897 in mkspline (inps=0x6361f0, inpn=2, tnas=Variable "tnas" is n
ot available.
) at route.c:246
246     {
(gdb) where
#0  0x000000000047b897 in mkspline (inps=0x6361f0, inpn=2, tnas=Variable "tnas"
is not available.
) at route.c:246
#1  0x000000000047bd83 in reallyroutespline (edges=0x63cbf0, edgen=12, inps=0x63
61f0, inpn=2,
    ev0={x = 0, y = 0}, ev1={x = 0, y = 0}) at route.c:220
#2  0x000000000047d07c in Proutespline (edges=0x63cbf0, edgen=12, input=
      {ps = 0x6361f0, pn = 2}, evs=0x7ffffff374a0, output=0x7ffffff374e0) at rou
te.c:171
#3  0x000000000044b079 in routesplines (pp=0x635ab0, npoints=0x7ffffff3bc7c) at
routespl.c:487
#4  0x000000000041bd12 in dot_splines (g=0x6189e0) at dotsplines.c:1263
#5  0x0000000000412161 in dot_layout (g=0x6189e0) at dotinit.c:230
#6  0x00000000004664d6 in gvLayoutJobs (gvc=0x60f260, g=0x6189e0) at gvlayout.c:
62
#7  0x0000000000411b89 in main (argc=Variable "argc" is not available.
) at dot.c:170
(gdb)


Expected results:


Additional info:

Comment 1 John Ellson 2005-12-15 00:55:34 UTC

Created attachment 122254 [details]
C source of the problem function (not complete enough to be compiled alone)

Comment 2 John Ellson 2005-12-15 00:56:55 UTC

Created attachment 122256 [details]
gdb disassembly of mkspline() when compiled with -O2

Comment 3 John Ellson 2005-12-15 00:58:01 UTC

Created attachment 122257 [details]
gdb disassembly of mkspline() when compiled with -O1

Comment 4 John Ellson 2005-12-15 02:32:09 UTC

This change to the code of mkspline() is successfully suppressing the problem:

+#if 0
     det01 = c[0][0] * c[1][1] - c[1][0] * c[0][1];
+#else
+    /* workaround for problem with:
+     *         gcc (GCC) 4.1.0 20051212 (Red Hat 4.1.0-0.7)
+     *
+     * the problem is (I think) a NAN from the subtraction which
+     * shows up as a SIGFPE a few lines down when
+     * det01 is used as the denominator in a division
+     */
+    det01 = c[0][0] * c[1][1];
+    d01 = c[1][0] * c[0][1];
+    if (ABS(d01) > 1e-6)
+       det01 -= d01;
+#endif

Comment 5 Jakub Jelinek 2005-12-15 10:06:57 UTC

I really need a self-contained testcase to look into this.  Please gather
what arguments the routine is called with and supply a main that sets up those
arguments.  Also what return values return the functions it is calling (and their
other side effects visible in this routine) and write stub functions that
do the same.

Comment 6 John Ellson 2005-12-15 10:14:41 UTC

gdb and fprintf(... %g ...) display the values as "0".  Can you recommend a way
to extract all the bits of the problem doubles ?

Comment 7 Jakub Jelinek 2005-12-15 10:23:07 UTC

union { double d; long long l; } u;
u.d = the_double_you_want;
fprintf ("%016llx\n", u.l);
In gdb you can just:
p/x *(long long *)&the_double_you_want
(but not in C, because that's strict aliasing violation).

Comment 8 John Ellson 2005-12-16 02:12:53 UTC

OK, following is a single file testcase, plus Makefile.
The problem seems to be dependendent
on -O2, -ffast-math, and having fpu error reporting enabled.  
The testcase exhibits the problem on both i386 and x86_64.

Comment 9 John Ellson 2005-12-16 02:13:40 UTC

Created attachment 122313 [details]
testcase

Comment 10 John Ellson 2005-12-16 02:14:15 UTC

Created attachment 122314 [details]
Makefile for testcase

Comment 11 John Ellson 2005-12-16 02:16:56 UTC

The SIGFPE occurs on a divide by zero.  The problem is that the code should
never have reached that divide because it is protected by a test for small
denominators.  Seems to be some kind of optimistic execution problem.

Comment 12 Jakub Jelinek 2005-12-16 12:51:49 UTC

That's just big misunderstanding what -ffast-math does.
From info gcc:
`-ffast-math'
     Sets `-fno-math-errno', `-funsafe-math-optimizations',
     `-fno-trapping-math', `-ffinite-math-only', `-fno-rounding-math',
     `-fno-signaling-nans' and `fcx-limited-range'.
...
`-fno-trapping-math'
     Compile code assuming that floating-point operations cannot
     generate user-visible traps.  These traps include division by
     zero, overflow, underflow, inexact result and invalid operation.
     This option implies `-fno-signaling-nans'.  Setting this option
     may allow faster code if one relies on "non-stop" IEEE arithmetic,
     for example.

Your program violates this, as your program can generate user-visible traps
- you enabled that explicitly by feenableexcept call.
Also, never ever #define __USE_GNU, that's glibc internal macro you should
never touch.  If you want GNU namespace, #define _GNU_SOURCE before including
first include header or compile with -D_GNU_SOURCE.

Note You need to log in before you can comment on or make changes to this bug.