116583 – gdb gets confused with multithreaded app

Bug 116583 - gdb gets confused with multithreaded app

Summary: gdb gets confused with multithreaded app

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux 3
Classification:	Red Hat
Component:	gdb
Sub Component:
Version:	3.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	high
Target Milestone:	---
Assignee:	Jeff Johnston
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	110848
Blocks:
TreeView+	depends on / blocked

Reported:	2004-02-23 15:12 UTC by Johan Walles
Modified:	2007-11-30 22:07 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2004-04-06 21:35:40 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Reproducer. Build w. "gcc -g -lpthread lphello.c -o lphello", run in gdb (2.90 KB, text/plain) 2004-02-23 15:16 UTC, Johan Walles	no flags	Details
View All

Description Johan Walles 2004-02-23 15:12:48 UTC

From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624

Description of problem:
I'll attach a reproducer for this that confuses gdb every time.  It
runs fine inside gdb on all non-NPTL platforms I have tried, and it
confuses gdb on all NPTL platforms I have tried.

Works: RHAS21/ia32, GNU gdb 5.3
Broken: RHEL3/ia32, GNU gdb Red Hat Linux (6.0post-0.20031117.6rh)
Broken: RHEL3/ia64, GNU gdb Red Hat Linux (5.3.90-0.20030710.40rh)

The exact failure mode is a bit different between different platforms,
but it has reproducably and obviously failed on all NPTL platforms I
have tried.

The program starts 100 threads and waits for them to finish.  It does
this over and over again.


Version-Release number of selected component (if applicable):
gdb-6.0post-0.20031117.6

How reproducible:
Always

Steps to Reproduce:
1. Build the to-be-attached lphello using "gcc -g -Wall -lpthread -o
lphello lphello.c".
2. Launch "gdb lphello".
3. "run"
4. Wait for a while (10 secs).
5. Interrupt the program using Ctrl-C.
6. Do "info threads".


Actual Results:  Varies.  Here are some examples:
"
..Cannot get thread event message: generic error
(gdb) bt
#0  0xb75cf6a1 in __nptl_create_event () from /lib/tls/libpthread.so.0
Error accessing memory address 0xb75cf6a0: Processen finns inte.
"

"
ptrace: No such process.
thread_db_get_info: cannot get thread info: generic error
(gdb) info threads
Cannot find new threads: generic error
(gdb) bt
#0  0x20000000000470b0 in __nptl_create_event () from
/lib/tls/libpthread.so.0
Segmentation fault
"



Expected Results:  The program should have run inside of gdb the same
as it does outside of gdb.  "info threads" should have given me a
listing of all active threads.  "bt" should give me a stack trace for
the current thread.

Additional info:

Comment 1 Johan Walles 2004-02-23 15:16:26 UTC

Created attachment 97947 [details]
Reproducer.  Build w. "gcc -g -lpthread lphello.c -o lphello", run in gdb

Comment 2 Andrew Cagney 2004-02-23 18:53:40 UTC

Suspect kernel bug.  Threads spontaneously disappearing.

Comment 5 Roland McGrath 2004-03-03 02:27:41 UTC

I've done some investigation with mainline gdb and Linux 2.6,
which (some of the time) behaves consistent with what was reported here.
I believe I know what's going on, though not precisely why.

The process is dying because there is a thread that gdb has not
attached to.  This is always a potential race condition with
process-wide signals such as those generated from the terminal or by
`kill'.  That is not due to a kernel bug, but rather is a limitation
of gdb's support for NPTL-style threads.  The only way to avoid that
race condition ever coming up is to use the new 2.6 ptrace feature
PTRACE_O_TRACECLONE instead of relying on libthread_db to tell you
about new threads.

However, in this case this failure mode is arising without a race.
When I run the test case under gdb, I see exactly 100 "New Thread"
messages, and then no more, while the program goes on to create many
more threads (it does many iterations of creating 100 threads, then
waiting for those 100 threads to finish).  Then the terminal-generated
SIGINT is taken by one of these later threads to which gdb never
attached, and it kills the whole process (attached threads included).

Please look on the gdb end as to why the threads after the 100th are
not getting attached.

Comment 6 Jeff Johnston 2004-04-06 21:35:40 UTC

A patch has been put in place in the next RHEL3 update.

Note You need to log in before you can comment on or make changes to this bug.