Bug 24310

Summary:

floating point problems with gcc 2.96 and gdb

Product:

[Retired] Red Hat Linux

Reporter:

David Baron <dbaron>

Component:

gdb

Assignee:

Trond Eivind Glomsrxd <teg>

Status:

CLOSED RAWHIDE

QA Contact:

Aaron Brown <abrown>

Severity:

high

Docs Contact:

Priority:

medium

Version:

7.0

CC:

bcrl, blizzard, bryner, dmose, jakub, msw, t8m, teg

Target Milestone:

---

Target Release:

---

Hardware:

i386

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2001-04-09 17:45:55 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

24445

Attachments:

Description	Flags
C++ testcase that behaves incorrectly when run under gdb	none
even simpler C++ testcase that shows incorrect behavior	none
Fixes gdb's handling of the fpu tag word	none

Description David Baron 2001-01-18 20:34:48 UTC

When I attempt to run mozilla under the gdb-5.0-11 from rawhide (the version
distributed with RH 7.0 had some other problems), Mozilla gives a lot of
assertions in layout code and then fails to paint a window.  (Most of the
assertions come from nsBoxFrame.cpp and nsSprocketLayout.cpp in
layout/xul/base/src/.)  These assertions don't happen when running without
the debugger or when running in older versions of gdb (or the current
version of gdb compiled with different compilers, see next paragraph).

I tried to use the gdb trunk snapshot from 2001-01-15 to see if it had
these problems.  When I compiled that snapshot using gcc 2.96
gcc-2.96-69), I saw the same problems.  When I compiled it using gcc
2.91 (kgcc-1.12-40), these problems went away (but there are still some
other problems, so I still don't have a working debugger).  So there
could be a gcc bug involved here, perhaps...

How to reproduce:
 * http://mozilla.org/build/unix.html .  FWIW, I use the following
.mozconfig file:
----
mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../obj-debug/
mk_add_options MOZ_MAKE_FLAGS=-j3
mk_add_options RUN_AUTOCONF_LOCALLY=1
ac_add_options --enable-mathml
ac_add_options --enable-svg
ac_add_options --disable-double-buffer
ac_add_options --enable-nspr-autoconf
ac_add_options --with-extensions=default,irc
ac_add_options --enable-jprof
----
 * run "./mozilla -g -d <path to debugger>" in the dist/bin/ directory
(within the object directory)
 * "run", or, if you have under 256MB RAM, "tbreak main", "run", "set
auto-solib-add 0", "c".

Comment 1 David Baron 2001-01-19 17:16:39 UTC

I've isolated the problem within mozilla to nsDeviceContextGTK::SetDPI, the line:

mPixelsToTwips = float(NSToIntRound(float(NSIntPointsToTwips(pt2t)) / float(aDpi)));

The calculation of float(NSIntPointsToTwips(pt2t)) / float(aDpi) is producing
the wrong result (nan instead of 11.52) the second and fourth times the code
runs, when run under gdb.  I have a similar, much simpler, testcase that shows
a similar problem the first time it is run.  I'm going to try to figure out
which compiler command line options are required to trigger the bug, and then
I'll attach the testcase.

Comment 2 David Baron 2001-01-19 17:23:38 UTC

The key option is '-pthread'.  I'll attach a simple C++ testcase that
demonstrates the problem in gdb (in runs correctly on its own, but produces
incorrect results under gdb) when it is compiled with:

g++ -pthread -o gdbbug gdbbug.cpp
or
g++ -pthread -g -o gdbbug gdbbug.cpp

That is, it shows these problems under the gdb-5.0-11 package, but not the
trunk gdb compiled with gcc 2.91 (although I suspect it would show the
problems when run in the trunk gdb compiled with gcc 2.96, since I saw
the mozilla problems then).

Comment 3 David Baron 2001-01-19 17:24:52 UTC

Created attachment 7868 [details]
C++ testcase that behaves incorrectly when run under gdb

Comment 4 David Baron 2001-01-19 17:26:50 UTC

The output I see is:

> ./gdbbug 
12.000000
12.000000

> /usr/bin/gdb ./gdbbug
GNU gdb 5.0
Copyright 2000 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(gdb) run
Starting program: /home/dbaron/gccbug/gdbbug/./gdbbug 
[New Thread 1024 (LWP 2470)]
-2147483648.000000
12.000000

Program exited normally.
Current language:  auto; currently c
(gdb) q

Comment 5 David Baron 2001-01-19 17:36:49 UTC

Created attachment 7869 [details]
even simpler C++ testcase that shows incorrect behavior

Comment 6 David Baron 2001-01-19 17:40:22 UTC

It is the exact same incorrect behavior that I was seeing in mozilla.  I thought
it was different because I was doing the printf at different times in mozilla
and the simple testcase.  But anyway, all that's needed to show the bug is a
floating point division.

The output with the new testcase is:

11.520000
11.520000

when run on its own and

nan
11.520000

when run within gdb.

I'll leave it to the gdb wizards to simplify gdb to demonstrate the bug in gcc.
 (Or maybe it's a feature, and the bug is in gdb...?)

Comment 7 Trond Eivind Glomsrxd 2001-01-19 17:42:56 UTC

Any chance of trying the cvs gdb with the test case?

Comment 8 Christopher Blizzard 2001-01-19 17:51:22 UTC

Changing summary and adding people that might care.  Oh, another note from dbaron:

[11:53:36] <dbaron> blizzard: Well, I think it's a gdb bug caused by a compiler
problem, since I see it in a gdb compiled with gcc 2.96 but not when the gdb is
compiled with gcc 2.91
[11:57:38] <blizzard> dbaron: what is gcc 2.91?
[11:57:45] <dbaron> kgcc
[11:57:54] <dbaron> egcs-2.91.66
[11:58:30] <blizzard> dbaron: oh, ok

Comment 9 Christopher Blizzard 2001-01-19 17:55:47 UTC

So, using the combination gdb-5.0-10, glibc-2.2-8, gcc-2.96-68 and
kernel-2.4.0-0.43.12 I don't have this problem.  I'm going to upgrade some stuff
and see if it explodes then.

Comment 10 David Baron 2001-01-19 18:00:27 UTC

When I compiled the cvs gdb snapshot from 2001-01-15 with egcs 2.91.66 (the
kgcc-1.1.2-40 RPM) and run under that gdb, I get the correct output.

However, when I compile that same gdb snapshot with gcc-2.96-70, I get the same
problem that I see in the gdb-5.0-11 package.

FWIW:

> rpm -q gcc kgcc gcc-c++ gdb glibc glibc-devel libstdc++
gcc-2.96-70
kgcc-1.1.2-40
gcc-c++-2.96-70
gdb-5.0-11
glibc-2.2-12
glibc-devel-2.2-12
libstdc++-2.96-70

> uname -a
Linux roam171-98.student.harvard.edu 2.4.0-test10 #2 SMP Fri Nov 3 21:31:39 EST
2000 i686 unknown

Machine is dual-CPU.

Comment 11 Jakub Jelinek 2001-01-19 18:09:25 UTC

This smells like there could be aliasing problem in gdb, could you please
try building gdb with -fno-strict-aliasing with gcc-2.96-7[01]?
If that does not help, I'll start looking into this after the weekend,
otherwise we'd need to find out where the bug in gdb is.

Comment 12 David Baron 2001-01-19 18:45:38 UTC

Compiling gdb with -fno-strict-aliasing does not help.

Comment 13 Christopher Blizzard 2001-01-19 18:46:54 UTC

I upgraded to:

gdb-5.0-11
gcc-2.96-70
glibc-2.2-9
kernel-2.4.0-0.43.12

And I still don't see the problem.

Comment 14 David Baron 2001-01-19 18:54:36 UTC

Well, I'm not the only one who sees this:

<imoT> dbaron: i updated gdb-5.0-7 -> gdb-5.0-11 and your testcase stop working
<dbaron> imoT: what kernel do you have?
<imoT> dbaron: 2.4.0
<dbaron> Do you have RedHat 7?
<imoT> yes
<imoT> gcc-2.96-69 kgcc-1.1.2-40 gcc-c++-2.96-69 gdb-5.0-7 glibc-2.2-9
glibc-devel-2.2-9 libstdc++-2.96-69
<dbaron> uname -a
<imoT> Linux rak046 2.4.0 #1 Fri Jan 5 17:58:51 EET 2001 i686 unknown
<dbaron> a kernel you compiled yourself?
<imoT> yes
<dbaron> oh, dual cpu?
<imoT> just single
<dbaron> just to confirm:  you saw the same problem i did when you upgraded to
gdb-5.0-11, but not in gdb-5.0-7 ?
<imoT> yes
<imoT> with gdb-5.0-7 it worked ok, when i updated it say "nan,11.520000"

Comment 15 Christopher Blizzard 2001-01-19 19:00:43 UTC

dbaron, did you compile your kernel yourself?

Comment 16 Trond Eivind Glomsrxd 2001-01-19 19:04:27 UTC

It happens for me on 2.2.17-11 and 2.4 kernels, with old and new gdb snapshots
compiled with different options (from standard to "-O0"). Argh.

Comment 17 Trond Eivind Glomsrxd 2001-01-19 19:12:01 UTC

Changing the compiler helps, though... reassigning

Comment 18 Trond Eivind Glomsrxd 2001-02-05 21:33:30 UTC

Doing a binary search of which file made the difference is hard... and you get
errors. Still a problem, though. Jakub, have you had any chance to look at it?

Comment 19 Trond Eivind Glomsrxd 2001-02-05 22:15:01 UTC

I don't think it's any of the files in the gdb directory (I ran standard gcc on
all the files in that directory, and the problem still didn't appear) - bfd,
perhaps.

Comment 20 Jakub Jelinek 2001-02-23 12:20:11 UTC

I don't think this is gcc issue, the difference is that if you run
kgcc -L/usr/lib (that's playing with fire btw, because you use glibc 2.1.3
includes and glibc 2.2 library), it uses glibc 2.1.3 sys/ptrace.h which does
not define PTRACE_GETFPXREGS and stuff like that, so SSE support does not
get compiled in.
I've just tried to undef by hand HAVE_PTRACE_GETFPXREGS in config.h and rebuilt
the whole of gdb subdirectory (in gdb built with your export CC="kgcc -L/usr/lib"
hack commented out in the spec file) and suddenly it works well.
I suspect either gdb has issues in fpxregs support, or kernel, or there are
some inconsistencied between what gdb expects, glibc declares, kernel expects,
whatever.

Comment 21 Brian Ryner 2001-02-28 05:58:55 UTC

Just another data point:  The latest gdb rpm (gdb-5.0rh-3) works fine for me on
kernel 2.2.16-22, but shows this problem with 2.2.17-14.

Comment 22 Trond Eivind Glomsrxd 2001-03-16 21:26:27 UTC

A gdb-5.0rh-4 is at http://people.redhat.com/teg/ (will show up in Rawhide
later), which tries to work around the problem a little bit.

Comment 23 Christopher Blizzard 2001-03-16 22:41:03 UTC

Hey, Ben. Doesn't this bug look familiar ( bug 31916? )

Comment 24 Don Howard 2001-04-09 17:40:13 UTC

Created attachment 15004 [details]
Fixes gdb's handling of the fpu tag word

Comment 25 Don Howard 2001-04-09 17:45:51 UTC

I've attached a patch from Kevin Buettner <kevinb> that fixes gdb's
handling of the fpu tag word.

Comment 26 Trond Eivind Glomsrxd 2001-04-10 01:13:43 UTC

Verified that it fixes the problem - it'll be in gdb-5.0rh-7, coming to Rawhide
and in the meantime available from http://people.redhat.com/teg/gdb/

Comment 27 Trond Eivind Glomsrxd 2001-04-10 01:15:42 UTC

*** Bug 31916 has been marked as a duplicate of this bug. ***