From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624 Description of problem: I'll attach a reproducer for this that confuses gdb every time. It runs fine inside gdb on all non-NPTL platforms I have tried, and it confuses gdb on all NPTL platforms I have tried. Works: RHAS21/ia32, GNU gdb 5.3 Broken: RHEL3/ia32, GNU gdb Red Hat Linux (6.0post-0.20031117.6rh) Broken: RHEL3/ia64, GNU gdb Red Hat Linux (5.3.90-0.20030710.40rh) The exact failure mode is a bit different between different platforms, but it has reproducably and obviously failed on all NPTL platforms I have tried. The program starts 100 threads and waits for them to finish. It does this over and over again. Version-Release number of selected component (if applicable): gdb-6.0post-0.20031117.6 How reproducible: Always Steps to Reproduce: 1. Build the to-be-attached lphello using "gcc -g -Wall -lpthread -o lphello lphello.c". 2. Launch "gdb lphello". 3. "run" 4. Wait for a while (10 secs). 5. Interrupt the program using Ctrl-C. 6. Do "info threads". Actual Results: Varies. Here are some examples: " ..Cannot get thread event message: generic error (gdb) bt #0 0xb75cf6a1 in __nptl_create_event () from /lib/tls/libpthread.so.0 Error accessing memory address 0xb75cf6a0: Processen finns inte. " " ptrace: No such process. thread_db_get_info: cannot get thread info: generic error (gdb) info threads Cannot find new threads: generic error (gdb) bt #0 0x20000000000470b0 in __nptl_create_event () from /lib/tls/libpthread.so.0 Segmentation fault " Expected Results: The program should have run inside of gdb the same as it does outside of gdb. "info threads" should have given me a listing of all active threads. "bt" should give me a stack trace for the current thread. Additional info:
Created attachment 97947 [details] Reproducer. Build w. "gcc -g -lpthread lphello.c -o lphello", run in gdb
Suspect kernel bug. Threads spontaneously disappearing.
I've done some investigation with mainline gdb and Linux 2.6, which (some of the time) behaves consistent with what was reported here. I believe I know what's going on, though not precisely why. The process is dying because there is a thread that gdb has not attached to. This is always a potential race condition with process-wide signals such as those generated from the terminal or by `kill'. That is not due to a kernel bug, but rather is a limitation of gdb's support for NPTL-style threads. The only way to avoid that race condition ever coming up is to use the new 2.6 ptrace feature PTRACE_O_TRACECLONE instead of relying on libthread_db to tell you about new threads. However, in this case this failure mode is arising without a race. When I run the test case under gdb, I see exactly 100 "New Thread" messages, and then no more, while the program goes on to create many more threads (it does many iterations of creating 100 threads, then waiting for those 100 threads to finish). Then the terminal-generated SIGINT is taken by one of these later threads to which gdb never attached, and it kills the whole process (attached threads included). Please look on the gdb end as to why the threads after the 100th are not getting attached.
A patch has been put in place in the next RHEL3 update.