Bug 216711 - gdb reporting erraneous thread stack entries
gdb reporting erraneous thread stack entries
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: gdb (Show other bugs)
rawhide
x86_64 Linux
medium Severity medium
: ---
: ---
Assigned To: Alexandre Oliva
:
: 216701 (view as bug list)
Depends On: 216506
Blocks:
  Show dependency treegraph
 
Reported: 2006-11-21 12:29 EST by Jan Kratochvil
Modified: 2007-11-30 17:11 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-12-09 05:29:33 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
_modified_ BEA's testsuite (3.88 KB, application/octet-stream)
2006-11-21 12:29 EST, Jan Kratochvil
no flags Details

  None (edit)
Description Jan Kratochvil 2006-11-21 12:29:53 EST
+++ This bug was initially created as a clone of Bug #216506 +++

Description of problem:

The provided test harness makes sure that gdb has no difficulty retrieving
information after said crash.

The BEA testsuite has been declared public in Bug 179399.

The question is, threads created with pthread_create() seem to have additional
stack output that is unknown.  Below is a sample run.  Issue is #5.
That output is confusing the test suite because it expects every line to
have proper resolution.

Can someone take a look at this ?  I tried sending this directly to the gdb
folks, then tools-gdb, with no response in weeks.


(gdb) ta a bt

...

Thread 2 (Thread 1084229984 (LWP 8699)):
#0  0x00000036dd08f7d5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1  0x00000036dd08f640 in sleep () from /lib64/tls/libc.so.6
#2  0x0000000000400b65 in makeSyscall (ignored=0x0) at threadcrash.c:142
#3  0x00000036ddb0610a in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000036dd0c68c3 in clone () from /lib64/tls/libc.so.6
#5  0x0000000000000000 in ?? ()

Thread 1 (Thread 182894231200 (LWP 8696)):
#0  main (argc=1, argv=0x7fbffff9a8) at threadcrash.c:297

The question is why is there output on item #5 on thread 2?  Their tests
complain about any ?? in the stacks.  Is this a case of walking too far up the
stack ?

Thanks in advance,

Barry




Version-Release number of selected component (if applicable):


How reproducible:

always

Steps to Reproduce:
1. Need test harness or at least the test program and a little assistance
2.
3.
  
Actual results:

...

Thread 2 (Thread 1084229984 (LWP 8699)):
#0  0x00000036dd08f7d5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1  0x00000036dd08f640 in sleep () from /lib64/tls/libc.so.6
#2  0x0000000000400b65 in makeSyscall (ignored=0x0) at threadcrash.c:142
#3  0x00000036ddb0610a in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000036dd0c68c3 in clone () from /lib64/tls/libc.so.6
#5  0x0000000000000000 in ?? ()

...

Expected results:

...

Thread 2 (Thread 1084229984 (LWP 8699)):
#0  0x00000036dd08f7d5 in __nanosleep_nocancel () from /lib64/tls/libc.so.6
#1  0x00000036dd08f640 in sleep () from /lib64/tls/libc.so.6
#2  0x0000000000400b65 in makeSyscall (ignored=0x0) at threadcrash.c:142
#3  0x00000036ddb0610a in start_thread () from /lib64/tls/libpthread.so.0
#4  0x00000036dd0c68c3 in clone () from /lib64/tls/libc.so.6

...

Additional info:

-- Additional comment from bmarson@redhat.com on 2006-11-21 10:45 EST --
This is a functional test suite.  BEA is just asking that there not be ambiguity
in gdb's reporting.  Their script analyzes the output to determine if they are
getting all the data.  If any of the tests fail (like the ?? search) then they
consider the test a failure.  This specific test was designed to reduce the
difficulty of analyzing java vm crash dumps.

I was thinking of trying RHEL5 beta 2 before rawhide.  Any thoughts on that ?

Thanks,

Barry

-- Additional comment from jkratoch@redhat.com on 2006-11-21 11:45 EST --
You are right it occurs on x86_64 even for RawHide gdb-6.5-15.fc7.x86_64:
#3  0x000000369ce061b5 in start_thread () from /lib64/libpthread.so.0
#4  0x000000369c2cd31d in clone () from /lib64/libc.so.6
#5  0x0000000000000000 in ?? ()
This problem is x86_64 specific, it is not present on i386.

This issue was not fixed in RawHide gdb-6.5-14.
RawHide:
* Thu Nov 02 2006 Jan Kratochvil <jan.kratochvil@redhat.com> - 6.5-14
- Fix "??" resolving of symbols from (non-prelinked) debuginfo packages.
- Fix "??" resolving of symbols from overlapping functions (nanosleep(3)).

It is going to be fixed in RawHide.
Comment 1 Jan Kratochvil 2006-11-21 12:29:53 EST
Created attachment 141797 [details]
_modified_ BEA's testsuite
Comment 2 Jan Kratochvil 2006-11-21 12:32:52 EST
*** Bug 216701 has been marked as a duplicate of this bug. ***
Comment 3 Jan Kratochvil 2006-12-09 05:29:33 EST
It is now fixed in RawHide glibc-2.5.90-11.x86_64 due to the patch imported by
Jakub from upstream:

2006-11-30  Jan Kratochvil  <jan.kratochvil@redhat.com>

        * sysdeps/unix/sysv/linux/x86_64/clone.S: Provide CFI for the outermost
        `clone' function to ensure proper unwinding stop of gdb.
Comment 4 Jan Kratochvil 2006-12-19 11:18:21 EST
Posted gdb `clone'-specific glibc backward compatible patch upstream:
http://sources.redhat.com/ml/gdb-patches/2006-12/msg00223.html

glibc-2.5.90-13.x86_64 has the CFI patch reverted due to regressions due to
missing `.cfi_undefined' functionality in libgcc. Jakub's libgcc fix in:
http://sources.redhat.com/ml/gdb/2006-12/msg00100.html

After updating libgcc in upstream/RawHide the CFI glibc patch should get back.
Comment 5 Jan Kratochvil 2006-12-19 16:38:49 EST
Committed the glibc backward compatible patch to RawHide:
* Tue Dec 19 2006 Jan Kratochvil <jan.kratochvil@redhat.com> - 6.5-20
- Fix bogus 0x0 unwind of the thread's topmost function clone(3) (BZ 216711).

Note You need to log in before you can comment on or make changes to this bug.