Red Hat Bugzilla – Bug 592031
gdb gets stuck on multi-threaded program which calls setuid() frequently
Last modified: 2011-05-13 08:58:39 EDT
Created attachment 413842 [details]
Description of problem:
When running a multi-threaded application which calls setuid() frequently, gdb gets stuck. This happens only on machines with 2 or more cpu cores and only under gdb.
Version-Release number of selected component (if applicable):
latest gdb RHEL5.5 (7.x branch) and prior (also 6.x branch)
can be reproduced also on Fedora 12
Steps to Reproduce:
1. Compile attached reproducer:
$ gcc -lpthread -g -o reproducer reproducer.c
2. and run it under gdb and wait a while:
$ gdb ./reproducer
j[Thread 0x41401940 (LWP 4090) exited]
c[New Thread 0x41401940 (LWP 4091)]
juc[Thread 0x41401940 (LWP 4091) exited]
[New Thread 0x41401940 (LWP 4092)]
j[Thread 0x41401940 (LWP 4092) exited]
c[New Thread 0x41401940 (LWP 4093)]
juc[Thread 0x41401940 (LWP 4093) exited]
[New Thread 0x41401940 (LWP 4094)]
jc[Thread 0x41401940 (LWP 4094) exited]
u[New Thread 0x41401940 (LWP 4095)]
j[Thread 0x41401940 (LWP 4095) exited]
c[New Thread 0x41401940 (LWP 4096)]
juc[Thread 0x41401940 (LWP 4096) exited]
[New Thread 0x41401940 (LWP 4097)]
ju[Thread 0x41401940 (LWP 4097) exited]
Program received signal SIGINT, Interrupt.
0x00000037dfc0613e in __nptl_setxid () from /lib64/libpthread.so.0
(gdb) thread apply all bt
Thread 2 (Thread 0x40a00940 (LWP 3933)):
#0 0x00000037dfc0d2ae in __lll_lock_wait_private () from /lib64/libpthread.so.0
#1 0x00000037dfc0757e in _L_lock_2370 () from /lib64/libpthread.so.0
#2 0x00000037dfc063ab in __deallocate_stack () from /lib64/libpthread.so.0
#3 0x00000037dfc0791a in pthread_join () from /lib64/libpthread.so.0
#4 0x00000000004007a2 in spawner (arg=0x0) at reproducer.c:18
#5 0x00000037dfc06617 in start_thread () from /lib64/libpthread.so.0
#6 0x00000037df0d3c2d in clone () from /lib64/libc.so.6
Thread 1 (Thread 0x2aaaaaac18a0 (LWP 3930)):
#0 0x00000037dfc0613e in __nptl_setxid () from /lib64/libpthread.so.0
#1 0x00000037df09aefd in setuid () from /lib64/libc.so.6
#2 0x00000000004007e1 in main () at reproducer.c:28
gdb shouldn't get stuck
This could be related to race condition with setuid() in glibc which was fixed recently, see the following BZ:
I cannot reproduce it on x86-64-5s-3-m1.ss.eng.bos.redhat.com where I could reproduce the glibc Bug 491995.
Also from the dump above I do not see GDB to be stuck, just the inferior got stuck and GDB was able to interrupt it.
Please reopen this Bug if you do not find this problem fixed by the glibc fix.
While not reproducible on RHEL-5 I got it reproducile on F-13.
Created attachment 430504 [details]
Proof of concept FSF GDB HEAD fix.
Confirming it is fixable just in GDB. RHEL will need some different form of fix.
Created attachment 441807 [details]
Updated fix on top of FSF GDB HEAD.
This functionality requires kernel backport of rt_tgsigqueueinfo, filing it.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
GDB could have lost important debugging information provided by the siginfo_t part of a POSIX signal during the debugging process. This update ensures that GDB preserves the associated siginfo_t information, and that debugging is transparent to the application, even in multithreaded programs with the setuid() function.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.