Bug 24310
Summary: | floating point problems with gcc 2.96 and gdb | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Retired] Red Hat Linux | Reporter: | David Baron <dbaron> | ||||||||
Component: | gdb | Assignee: | Trond Eivind Glomsrxd <teg> | ||||||||
Status: | CLOSED RAWHIDE | QA Contact: | Aaron Brown <abrown> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | medium | ||||||||||
Version: | 7.0 | CC: | bcrl, blizzard, bryner, dmose, jakub, msw, t8m, teg | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | i386 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2001-04-09 17:45:55 UTC | Type: | --- | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Bug Depends On: | |||||||||||
Bug Blocks: | 24445 | ||||||||||
Attachments: |
|
Description
David Baron
2001-01-18 20:34:48 UTC
I've isolated the problem within mozilla to nsDeviceContextGTK::SetDPI, the line: mPixelsToTwips = float(NSToIntRound(float(NSIntPointsToTwips(pt2t)) / float(aDpi))); The calculation of float(NSIntPointsToTwips(pt2t)) / float(aDpi) is producing the wrong result (nan instead of 11.52) the second and fourth times the code runs, when run under gdb. I have a similar, much simpler, testcase that shows a similar problem the first time it is run. I'm going to try to figure out which compiler command line options are required to trigger the bug, and then I'll attach the testcase. The key option is '-pthread'. I'll attach a simple C++ testcase that demonstrates the problem in gdb (in runs correctly on its own, but produces incorrect results under gdb) when it is compiled with: g++ -pthread -o gdbbug gdbbug.cpp or g++ -pthread -g -o gdbbug gdbbug.cpp That is, it shows these problems under the gdb-5.0-11 package, but not the trunk gdb compiled with gcc 2.91 (although I suspect it would show the problems when run in the trunk gdb compiled with gcc 2.96, since I saw the mozilla problems then). Created attachment 7868 [details]
C++ testcase that behaves incorrectly when run under gdb
The output I see is: > ./gdbbug 12.000000 12.000000 > /usr/bin/gdb ./gdbbug GNU gdb 5.0 Copyright 2000 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "i386-redhat-linux"... (gdb) run Starting program: /home/dbaron/gccbug/gdbbug/./gdbbug [New Thread 1024 (LWP 2470)] -2147483648.000000 12.000000 Program exited normally. Current language: auto; currently c (gdb) q Created attachment 7869 [details]
even simpler C++ testcase that shows incorrect behavior
It is the exact same incorrect behavior that I was seeing in mozilla. I thought it was different because I was doing the printf at different times in mozilla and the simple testcase. But anyway, all that's needed to show the bug is a floating point division. The output with the new testcase is: 11.520000 11.520000 when run on its own and nan 11.520000 when run within gdb. I'll leave it to the gdb wizards to simplify gdb to demonstrate the bug in gcc. (Or maybe it's a feature, and the bug is in gdb...?) Any chance of trying the cvs gdb with the test case? Changing summary and adding people that might care. Oh, another note from dbaron: [11:53:36] <dbaron> blizzard: Well, I think it's a gdb bug caused by a compiler problem, since I see it in a gdb compiled with gcc 2.96 but not when the gdb is compiled with gcc 2.91 [11:57:38] <blizzard> dbaron: what is gcc 2.91? [11:57:45] <dbaron> kgcc [11:57:54] <dbaron> egcs-2.91.66 [11:58:30] <blizzard> dbaron: oh, ok So, using the combination gdb-5.0-10, glibc-2.2-8, gcc-2.96-68 and kernel-2.4.0-0.43.12 I don't have this problem. I'm going to upgrade some stuff and see if it explodes then. When I compiled the cvs gdb snapshot from 2001-01-15 with egcs 2.91.66 (the kgcc-1.1.2-40 RPM) and run under that gdb, I get the correct output. However, when I compile that same gdb snapshot with gcc-2.96-70, I get the same problem that I see in the gdb-5.0-11 package. FWIW: > rpm -q gcc kgcc gcc-c++ gdb glibc glibc-devel libstdc++ gcc-2.96-70 kgcc-1.1.2-40 gcc-c++-2.96-70 gdb-5.0-11 glibc-2.2-12 glibc-devel-2.2-12 libstdc++-2.96-70 > uname -a Linux roam171-98.student.harvard.edu 2.4.0-test10 #2 SMP Fri Nov 3 21:31:39 EST 2000 i686 unknown Machine is dual-CPU. This smells like there could be aliasing problem in gdb, could you please try building gdb with -fno-strict-aliasing with gcc-2.96-7[01]? If that does not help, I'll start looking into this after the weekend, otherwise we'd need to find out where the bug in gdb is. Compiling gdb with -fno-strict-aliasing does not help. I upgraded to: gdb-5.0-11 gcc-2.96-70 glibc-2.2-9 kernel-2.4.0-0.43.12 And I still don't see the problem. Well, I'm not the only one who sees this: <imoT> dbaron: i updated gdb-5.0-7 -> gdb-5.0-11 and your testcase stop working <dbaron> imoT: what kernel do you have? <imoT> dbaron: 2.4.0 <dbaron> Do you have RedHat 7? <imoT> yes <imoT> gcc-2.96-69 kgcc-1.1.2-40 gcc-c++-2.96-69 gdb-5.0-7 glibc-2.2-9 glibc-devel-2.2-9 libstdc++-2.96-69 <dbaron> uname -a <imoT> Linux rak046 2.4.0 #1 Fri Jan 5 17:58:51 EET 2001 i686 unknown <dbaron> a kernel you compiled yourself? <imoT> yes <dbaron> oh, dual cpu? <imoT> just single <dbaron> just to confirm: you saw the same problem i did when you upgraded to gdb-5.0-11, but not in gdb-5.0-7 ? <imoT> yes <imoT> with gdb-5.0-7 it worked ok, when i updated it say "nan,11.520000" dbaron, did you compile your kernel yourself? It happens for me on 2.2.17-11 and 2.4 kernels, with old and new gdb snapshots compiled with different options (from standard to "-O0"). Argh. Changing the compiler helps, though... reassigning Doing a binary search of which file made the difference is hard... and you get errors. Still a problem, though. Jakub, have you had any chance to look at it? I don't think it's any of the files in the gdb directory (I ran standard gcc on all the files in that directory, and the problem still didn't appear) - bfd, perhaps. I don't think this is gcc issue, the difference is that if you run kgcc -L/usr/lib (that's playing with fire btw, because you use glibc 2.1.3 includes and glibc 2.2 library), it uses glibc 2.1.3 sys/ptrace.h which does not define PTRACE_GETFPXREGS and stuff like that, so SSE support does not get compiled in. I've just tried to undef by hand HAVE_PTRACE_GETFPXREGS in config.h and rebuilt the whole of gdb subdirectory (in gdb built with your export CC="kgcc -L/usr/lib" hack commented out in the spec file) and suddenly it works well. I suspect either gdb has issues in fpxregs support, or kernel, or there are some inconsistencied between what gdb expects, glibc declares, kernel expects, whatever. Just another data point: The latest gdb rpm (gdb-5.0rh-3) works fine for me on kernel 2.2.16-22, but shows this problem with 2.2.17-14. A gdb-5.0rh-4 is at http://people.redhat.com/teg/ (will show up in Rawhide later), which tries to work around the problem a little bit. Hey, Ben. Doesn't this bug look familiar ( bug 31916? ) Created attachment 15004 [details]
Fixes gdb's handling of the fpu tag word
I've attached a patch from Kevin Buettner <kevinb> that fixes gdb's handling of the fpu tag word. Verified that it fixes the problem - it'll be in gdb-5.0rh-7, coming to Rawhide and in the meantime available from http://people.redhat.com/teg/gdb/ *** Bug 31916 has been marked as a duplicate of this bug. *** |