Red Hat Bugzilla – Bug 24310
floating point problems with gcc 2.96 and gdb
Last modified: 2007-04-18 12:30:46 EDT
When I attempt to run mozilla under the gdb-5.0-11 from rawhide (the version
distributed with RH 7.0 had some other problems), Mozilla gives a lot of
assertions in layout code and then fails to paint a window. (Most of the
assertions come from nsBoxFrame.cpp and nsSprocketLayout.cpp in
layout/xul/base/src/.) These assertions don't happen when running without
the debugger or when running in older versions of gdb (or the current
version of gdb compiled with different compilers, see next paragraph).
I tried to use the gdb trunk snapshot from 2001-01-15 to see if it had
these problems. When I compiled that snapshot using gcc 2.96
gcc-2.96-69), I saw the same problems. When I compiled it using gcc
2.91 (kgcc-1.12-40), these problems went away (but there are still some
other problems, so I still don't have a working debugger). So there
could be a gcc bug involved here, perhaps...
How to reproduce:
* http://mozilla.org/build/unix.html . FWIW, I use the following
* run "./mozilla -g -d <path to debugger>" in the dist/bin/ directory
(within the object directory)
* "run", or, if you have under 256MB RAM, "tbreak main", "run", "set
auto-solib-add 0", "c".
I've isolated the problem within mozilla to nsDeviceContextGTK::SetDPI, the line:
mPixelsToTwips = float(NSToIntRound(float(NSIntPointsToTwips(pt2t)) / float(aDpi)));
The calculation of float(NSIntPointsToTwips(pt2t)) / float(aDpi) is producing
the wrong result (nan instead of 11.52) the second and fourth times the code
runs, when run under gdb. I have a similar, much simpler, testcase that shows
a similar problem the first time it is run. I'm going to try to figure out
which compiler command line options are required to trigger the bug, and then
I'll attach the testcase.
The key option is '-pthread'. I'll attach a simple C++ testcase that
demonstrates the problem in gdb (in runs correctly on its own, but produces
incorrect results under gdb) when it is compiled with:
g++ -pthread -o gdbbug gdbbug.cpp
g++ -pthread -g -o gdbbug gdbbug.cpp
That is, it shows these problems under the gdb-5.0-11 package, but not the
trunk gdb compiled with gcc 2.91 (although I suspect it would show the
problems when run in the trunk gdb compiled with gcc 2.96, since I saw
the mozilla problems then).
Created attachment 7868 [details]
C++ testcase that behaves incorrectly when run under gdb
The output I see is:
> /usr/bin/gdb ./gdbbug
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
Starting program: /home/dbaron/gccbug/gdbbug/./gdbbug
[New Thread 1024 (LWP 2470)]
Program exited normally.
Current language: auto; currently c
Created attachment 7869 [details]
even simpler C++ testcase that shows incorrect behavior
It is the exact same incorrect behavior that I was seeing in mozilla. I thought
it was different because I was doing the printf at different times in mozilla
and the simple testcase. But anyway, all that's needed to show the bug is a
floating point division.
The output with the new testcase is:
when run on its own and
when run within gdb.
I'll leave it to the gdb wizards to simplify gdb to demonstrate the bug in gcc.
(Or maybe it's a feature, and the bug is in gdb...?)
Any chance of trying the cvs gdb with the test case?
Changing summary and adding people that might care. Oh, another note from dbaron:
[11:53:36] <dbaron> blizzard: Well, I think it's a gdb bug caused by a compiler
problem, since I see it in a gdb compiled with gcc 2.96 but not when the gdb is
compiled with gcc 2.91
[11:57:38] <blizzard> dbaron: what is gcc 2.91?
[11:57:45] <dbaron> kgcc
[11:57:54] <dbaron> egcs-2.91.66
[11:58:30] <blizzard> dbaron: oh, ok
So, using the combination gdb-5.0-10, glibc-2.2-8, gcc-2.96-68 and
kernel-2.4.0-0.43.12 I don't have this problem. I'm going to upgrade some stuff
and see if it explodes then.
When I compiled the cvs gdb snapshot from 2001-01-15 with egcs 2.91.66 (the
kgcc-1.1.2-40 RPM) and run under that gdb, I get the correct output.
However, when I compile that same gdb snapshot with gcc-2.96-70, I get the same
problem that I see in the gdb-5.0-11 package.
> rpm -q gcc kgcc gcc-c++ gdb glibc glibc-devel libstdc++
> uname -a
Linux roam171-98.student.harvard.edu 2.4.0-test10 #2 SMP Fri Nov 3 21:31:39 EST
2000 i686 unknown
Machine is dual-CPU.
This smells like there could be aliasing problem in gdb, could you please
try building gdb with -fno-strict-aliasing with gcc-2.96-7?
If that does not help, I'll start looking into this after the weekend,
otherwise we'd need to find out where the bug in gdb is.
Compiling gdb with -fno-strict-aliasing does not help.
I upgraded to:
And I still don't see the problem.
Well, I'm not the only one who sees this:
<imoT> dbaron: i updated gdb-5.0-7 -> gdb-5.0-11 and your testcase stop working
<dbaron> imoT: what kernel do you have?
<imoT> dbaron: 2.4.0
<dbaron> Do you have RedHat 7?
<imoT> gcc-2.96-69 kgcc-1.1.2-40 gcc-c++-2.96-69 gdb-5.0-7 glibc-2.2-9
<dbaron> uname -a
<imoT> Linux rak046 2.4.0 #1 Fri Jan 5 17:58:51 EET 2001 i686 unknown
<dbaron> a kernel you compiled yourself?
<dbaron> oh, dual cpu?
<imoT> just single
<dbaron> just to confirm: you saw the same problem i did when you upgraded to
gdb-5.0-11, but not in gdb-5.0-7 ?
<imoT> with gdb-5.0-7 it worked ok, when i updated it say "nan,11.520000"
dbaron, did you compile your kernel yourself?
It happens for me on 2.2.17-11 and 2.4 kernels, with old and new gdb snapshots
compiled with different options (from standard to "-O0"). Argh.
Changing the compiler helps, though... reassigning
Doing a binary search of which file made the difference is hard... and you get
errors. Still a problem, though. Jakub, have you had any chance to look at it?
I don't think it's any of the files in the gdb directory (I ran standard gcc on
all the files in that directory, and the problem still didn't appear) - bfd,
I don't think this is gcc issue, the difference is that if you run
kgcc -L/usr/lib (that's playing with fire btw, because you use glibc 2.1.3
includes and glibc 2.2 library), it uses glibc 2.1.3 sys/ptrace.h which does
not define PTRACE_GETFPXREGS and stuff like that, so SSE support does not
get compiled in.
I've just tried to undef by hand HAVE_PTRACE_GETFPXREGS in config.h and rebuilt
the whole of gdb subdirectory (in gdb built with your export CC="kgcc -L/usr/lib"
hack commented out in the spec file) and suddenly it works well.
I suspect either gdb has issues in fpxregs support, or kernel, or there are
some inconsistencied between what gdb expects, glibc declares, kernel expects,
Just another data point: The latest gdb rpm (gdb-5.0rh-3) works fine for me on
kernel 2.2.16-22, but shows this problem with 2.2.17-14.
A gdb-5.0rh-4 is at http://people.redhat.com/teg/ (will show up in Rawhide
later), which tries to work around the problem a little bit.
Hey, Ben. Doesn't this bug look familiar ( bug 31916? )
Created attachment 15004 [details]
Fixes gdb's handling of the fpu tag word
I've attached a patch from Kevin Buettner <email@example.com> that fixes gdb's
handling of the fpu tag word.
Verified that it fixes the problem - it'll be in gdb-5.0rh-7, coming to Rawhide
and in the meantime available from http://people.redhat.com/teg/gdb/
*** Bug 31916 has been marked as a duplicate of this bug. ***