104763 – gdb can't get a stack

Bug 104763 - gdb can't get a stack

Summary: gdb can't get a stack

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Raw Hide
Classification:	Retired
Component:	gdb
Sub Component:
Version:	1.0
Hardware:	i686
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Assignee:	Elena Zannoni
QA Contact:	Jay Turner
Docs Contact:
URL:	http://bugzilla.mozilla.org/show_bug....
Whiteboard:
Depends On:	104781 108892
Blocks:
TreeView+	depends on / blocked

Reported:	2003-09-20 17:54 UTC by Andrew Schultz
Modified:	2015-01-08 00:06 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2004-12-21 19:36:55 UTC
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2004:561	0	normal	SHIPPED_LIVE	Updated gdb and libunwind packages	2004-12-21 05:00:00 UTC

Description Andrew Schultz 2003-09-20 17:54:08 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030918

Description of problem:
When attempting to grab a stack from a hung Mozilla process (apparently thread
deadlock), gdb from rawhide was only able to pull the following stack:

#0  0xffffe002 in ?? ()
#1  0x400f341a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0
#2  0x087abcdc in ?? ()
#3  0x0876419c in ?? ()
#4  0x00000002 in ?? ()

the actual stack from RH9's gdb (5.3post-0.20021129.18) from the same build of
Mozilla is here:
http://bugzilla.mozilla.org/attachment.cgi?id=131807&action=view

Version-Release number of selected component (if applicable):
gdb-5.3.90-0.20030710.21

How reproducible:
Always

Steps to Reproduce:
1. fire up Mozilla (1.4 or later), go to
http://openmathtag.sourceforge.net/OMT_demo_interactive.php
2. attach gdb to Mozilla
3. grab the stack
    
Actual Results:  incorrect stack

Expected Results:  stack as obtained from earlier version of gdb

Additional info:
I poked through all the threads to make sure I was on the main thread.  None of
them had the real stack for the main thread.

Comment 1 Andrew Cagney 2003-09-20 18:18:32 UTC

What happens if you try:

(gdb) x/i 0xffffe002

Comment 2 Andrew Schultz 2003-09-21 03:08:29 UTC

both gdb (RH9 and rawhide) say:
(gdb) x/i 0xffffe002
0xffffe002:     Cannot access memory at address 0xffffe002
(gdb) x/i 0x400c7a30
0x400c7a30 <PR_Wait+160>:       mov    0x8(%ebp),%edx

Comment 3 Andrew Schultz 2003-09-21 03:37:02 UTC

oops

Comment 4 Andrew Cagney 2003-09-21 20:14:57 UTC

> both gdb (RH9 and rawhide) say:
> (gdb) x/i 0xffffe002
> 0xffffe002:     Cannot access memory at address 0xffffe002

GDB isn't able to read the "vsyscall" page at/around 0xffffe002, isn't able to
figure out how to backtrace out of that code :-(  There isn't much GDB can do
here - think of trying to fly blind

Recommend down gradeing to a kernel that doesn't include the "vsyscall" page, or
disabling the vsyscall page (not sure how this is done).

Comment 6 Joel Brobecker 2003-10-23 19:09:14 UTC

Actually, I think that 0xffffe002 was a red-herring. See:
http://sources.redhat.com/ml/gdb-patches/2003-10/msg00440.html
(note that the message above also provides a simpler reproducer)

While it is true that the debugger can not yet read the memory at that address,
and therefore can only guess as to how to unwind from it, it seems to me that it
is doing a better job now (6.0 and head) compared to 5.3.

The real problem, I think, is that GDB is unable to unwind correctly past
the pthread_cond_wait() function: The function is frameless, and to make
things more difficult, the stack space allocated during the lifetime of
that function is adjusted twice. The unwinder only sees the first allocation,
and therefore miscalculates the "virtual frame" size, and hence fetches the
value of the saved registers from the wrong location.

The only hope that was expressed during the discussion mentioned by the URL
above is by adding DWARF2 CFI info to the NPTL.

Comment 7 Elena Zannoni 2004-08-27 19:54:19 UTC

I think this is now fixed. I'll put in modified state. I'll close this
in a week if I don't hear back.

Comment 8 John Flanagan 2004-12-21 19:36:55 UTC

An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2004-561.html

Note You need to log in before you can comment on or make changes to this bug.