Bug 175793 - problem with -O2, but not with -O1
problem with -O2, but not with -O1
Status: CLOSED NOTABUG
Product: Fedora
Classification: Fedora
Component: gcc (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Jakub Jelinek
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2005-12-14 19:55 EST by John Ellson
Modified: 2007-11-30 17:11 EST (History)
0 users

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-12-16 07:51:49 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
C source of the problem function (not complete enough to be compiled alone) (1.33 KB, text/plain)
2005-12-14 19:55 EST, John Ellson
no flags Details
gdb disassembly of mkspline() when compiled with -O2 (14.98 KB, text/plain)
2005-12-14 19:56 EST, John Ellson
no flags Details
gdb disassembly of mkspline() when compiled with -O1 (14.62 KB, text/plain)
2005-12-14 19:58 EST, John Ellson
no flags Details
testcase (3.57 KB, text/plain)
2005-12-15 21:13 EST, John Ellson
no flags Details
Makefile for testcase (97 bytes, text/plain)
2005-12-15 21:14 EST, John Ellson
no flags Details

  None (edit)
Description John Ellson 2005-12-14 19:55:34 EST
Description of problem:
I have a piece of code that has worked for perhaps five years with previous
versions of gcc and with other C compilers, and which now fails with gcc -O2,
but not with -O1

The program dies with: SIGFPE, Arithmetic exception.

The cause of the SIGFPE is an attempt to divide by a NAN.  The problem is why
the NAN occurred.

The NAN occurs, only with -O2, from the subtraction in the statement:
    det01 = c[0][0] * c[1][1] - c[1][0] * c[0][1];
with the four c[][] values close to, but apparently not exactly zero.

Version-Release number of selected component (if applicable):
gcc-4.1.0-0.7

How reproducible:
100% (The code is in lib/pathplan/route.c in graphviz, which is in Fedora Extras)

Steps to Reproduce:
1. make with gcc -O2
2.
3.
  
Actual results:
ellson@ontap:dot> gdb ./dot_static
GNU gdb Red Hat Linux (6.3.0.0-1.81rh)
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...Using host libthread_db l
ibrary "/lib64/libthread_db.so.1".

(gdb) run hello.dot
Starting program: /home/ellson/FIX/Linux.x86_64/build/graphviz2/cmd/dot/dot_stat
ic hello.dot

Program received signal SIGFPE, Arithmetic exception.
0x000000000047b897 in mkspline (inps=0x6361f0, inpn=2, tnas=Variable "tnas" is n
ot available.
) at route.c:246
246     {
(gdb) where
#0  0x000000000047b897 in mkspline (inps=0x6361f0, inpn=2, tnas=Variable "tnas"
is not available.
) at route.c:246
#1  0x000000000047bd83 in reallyroutespline (edges=0x63cbf0, edgen=12, inps=0x63
61f0, inpn=2,
    ev0={x = 0, y = 0}, ev1={x = 0, y = 0}) at route.c:220
#2  0x000000000047d07c in Proutespline (edges=0x63cbf0, edgen=12, input=
      {ps = 0x6361f0, pn = 2}, evs=0x7ffffff374a0, output=0x7ffffff374e0) at rou
te.c:171
#3  0x000000000044b079 in routesplines (pp=0x635ab0, npoints=0x7ffffff3bc7c) at
routespl.c:487
#4  0x000000000041bd12 in dot_splines (g=0x6189e0) at dotsplines.c:1263
#5  0x0000000000412161 in dot_layout (g=0x6189e0) at dotinit.c:230
#6  0x00000000004664d6 in gvLayoutJobs (gvc=0x60f260, g=0x6189e0) at gvlayout.c:
62
#7  0x0000000000411b89 in main (argc=Variable "argc" is not available.
) at dot.c:170
(gdb)


Expected results:


Additional info:
Comment 1 John Ellson 2005-12-14 19:55:34 EST
Created attachment 122254 [details]
C source of the problem function (not complete enough to be compiled alone)
Comment 2 John Ellson 2005-12-14 19:56:55 EST
Created attachment 122256 [details]
gdb disassembly of mkspline() when compiled with -O2
Comment 3 John Ellson 2005-12-14 19:58:01 EST
Created attachment 122257 [details]
gdb disassembly of mkspline() when compiled with -O1
Comment 4 John Ellson 2005-12-14 21:32:09 EST
This change to the code of mkspline() is successfully suppressing the problem:

+#if 0
     det01 = c[0][0] * c[1][1] - c[1][0] * c[0][1];
+#else
+    /* workaround for problem with:
+     *         gcc (GCC) 4.1.0 20051212 (Red Hat 4.1.0-0.7)
+     *
+     * the problem is (I think) a NAN from the subtraction which
+     * shows up as a SIGFPE a few lines down when
+     * det01 is used as the denominator in a division
+     */
+    det01 = c[0][0] * c[1][1];
+    d01 = c[1][0] * c[0][1];
+    if (ABS(d01) > 1e-6)
+       det01 -= d01;
+#endif
Comment 5 Jakub Jelinek 2005-12-15 05:06:57 EST
I really need a self-contained testcase to look into this.  Please gather
what arguments the routine is called with and supply a main that sets up those
arguments.  Also what return values return the functions it is calling (and their
other side effects visible in this routine) and write stub functions that
do the same.
Comment 6 John Ellson 2005-12-15 05:14:41 EST
gdb and fprintf(... %g ...) display the values as "0".  Can you recommend a way
to extract all the bits of the problem doubles ?
Comment 7 Jakub Jelinek 2005-12-15 05:23:07 EST
union { double d; long long l; } u;
u.d = the_double_you_want;
fprintf ("%016llx\n", u.l);
In gdb you can just:
p/x *(long long *)&the_double_you_want
(but not in C, because that's strict aliasing violation).
Comment 8 John Ellson 2005-12-15 21:12:53 EST
OK, following is a single file testcase, plus Makefile.
The problem seems to be dependendent
on -O2, -ffast-math, and having fpu error reporting enabled.  
The testcase exhibits the problem on both i386 and x86_64.
Comment 9 John Ellson 2005-12-15 21:13:40 EST
Created attachment 122313 [details]
testcase
Comment 10 John Ellson 2005-12-15 21:14:15 EST
Created attachment 122314 [details]
Makefile for testcase
Comment 11 John Ellson 2005-12-15 21:16:56 EST
The SIGFPE occurs on a divide by zero.  The problem is that the code should
never have reached that divide because it is protected by a test for small
denominators.  Seems to be some kind of optimistic execution problem.
Comment 12 Jakub Jelinek 2005-12-16 07:51:49 EST
That's just big misunderstanding what -ffast-math does.
From info gcc:
`-ffast-math'
     Sets `-fno-math-errno', `-funsafe-math-optimizations',
     `-fno-trapping-math', `-ffinite-math-only', `-fno-rounding-math',
     `-fno-signaling-nans' and `fcx-limited-range'.
...
`-fno-trapping-math'
     Compile code assuming that floating-point operations cannot
     generate user-visible traps.  These traps include division by
     zero, overflow, underflow, inexact result and invalid operation.
     This option implies `-fno-signaling-nans'.  Setting this option
     may allow faster code if one relies on "non-stop" IEEE arithmetic,
     for example.

Your program violates this, as your program can generate user-visible traps
- you enabled that explicitly by feenableexcept call.
Also, never ever #define __USE_GNU, that's glibc internal macro you should
never touch.  If you want GNU namespace, #define _GNU_SOURCE before including
first include header or compile with -D_GNU_SOURCE.

Note You need to log in before you can comment on or make changes to this bug.