From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.5b) Gecko/20030918 Description of problem: When attempting to grab a stack from a hung Mozilla process (apparently thread deadlock), gdb from rawhide was only able to pull the following stack: #0 0xffffe002 in ?? () #1 0x400f341a in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/tls/libpthread.so.0 #2 0x087abcdc in ?? () #3 0x0876419c in ?? () #4 0x00000002 in ?? () the actual stack from RH9's gdb (5.3post-0.20021129.18) from the same build of Mozilla is here: http://bugzilla.mozilla.org/attachment.cgi?id=131807&action=view Version-Release number of selected component (if applicable): gdb-5.3.90-0.20030710.21 How reproducible: Always Steps to Reproduce: 1. fire up Mozilla (1.4 or later), go to http://openmathtag.sourceforge.net/OMT_demo_interactive.php 2. attach gdb to Mozilla 3. grab the stack Actual Results: incorrect stack Expected Results: stack as obtained from earlier version of gdb Additional info: I poked through all the threads to make sure I was on the main thread. None of them had the real stack for the main thread.
What happens if you try: (gdb) x/i 0xffffe002
both gdb (RH9 and rawhide) say: (gdb) x/i 0xffffe002 0xffffe002: Cannot access memory at address 0xffffe002 (gdb) x/i 0x400c7a30 0x400c7a30 <PR_Wait+160>: mov 0x8(%ebp),%edx
oops
> both gdb (RH9 and rawhide) say: > (gdb) x/i 0xffffe002 > 0xffffe002: Cannot access memory at address 0xffffe002 GDB isn't able to read the "vsyscall" page at/around 0xffffe002, isn't able to figure out how to backtrace out of that code :-( There isn't much GDB can do here - think of trying to fly blind Recommend down gradeing to a kernel that doesn't include the "vsyscall" page, or disabling the vsyscall page (not sure how this is done).
Actually, I think that 0xffffe002 was a red-herring. See: http://sources.redhat.com/ml/gdb-patches/2003-10/msg00440.html (note that the message above also provides a simpler reproducer) While it is true that the debugger can not yet read the memory at that address, and therefore can only guess as to how to unwind from it, it seems to me that it is doing a better job now (6.0 and head) compared to 5.3. The real problem, I think, is that GDB is unable to unwind correctly past the pthread_cond_wait() function: The function is frameless, and to make things more difficult, the stack space allocated during the lifetime of that function is adjusted twice. The unwinder only sees the first allocation, and therefore miscalculates the "virtual frame" size, and hence fetches the value of the saved registers from the wrong location. The only hope that was expressed during the discussion mentioned by the URL above is by adding DWARF2 CFI info to the NPTL.
I think this is now fixed. I'll put in modified state. I'll close this in a week if I don't hear back.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2004-561.html