Description of problem: If a thread makes a setuid() call and then terminates and another threads attempts to call pthread_join() on it a deadlock occurs. Version-Release number of selected component (if applicable): Not sure whether this is glibc or kernel, I tested on these two combinations: 2.6.9-42.0.10.ELsmp and glibc-2.3.4-2.25 2.6.9-55.ELsmp, glibc-2.3.2-95.30 This also seems to affect RedHat 5 but I don't have the exact system configs right now (can verify if needed) How reproducible: I have a small (53 line) demo program. Steps to Reproduce: 1. gcc -pthreads ths.c 2. ./a.out 3. profit!!! Actual results: launched threads, iter 0 Expected results: launched threads, iter 0 joined threads launched threads, iter 1 joined threads ... many times Additional info: I'm not sure why would anybody call setuid() in a thread, but hey -- it's a bug.
Created attachment 159495 [details] sample program
I've reproduced this problem with stock RHEL4.5 and also with a recent interim build (2.6.9-55.14.EL) RHEL4.6-under-development kernel (since there was a futex fix applied earlier in U6). I think the problem lies in the thread clean-up handling in glibc, and thus I'm reassigning this BZ appropriately. The setuid() calls in each thread all succeed (if run as root) or all fail (if run as non-root), but it seems that one (or sometimes two or three) thread(s) never fully exit. They complete execution of the thread function, but the pthread_kill() function from the parent still finds them (i.e., the call returns 0 instead of ESRCH) and a subsequent pthread_join() would wait indefinitely. Note that the call to pthread_kill() is made with a signal arg of 0, which does not actually kill the thread (which is intentional). I suspect that a pthread_kill() racing with a setuid() might be at the heart of this problem, only because changing the setuid() to several other syscalls makes the problem unreproducible. I will attach a modified version of the reproducer, which contains some added debugging logic.
Created attachment 159623 [details] modified version of reproducer referred to above
Found what sounds like a dup of this, BZ#3270.
I forgot to mention that the thread(s) that get stuck (following execution of their setuid() syscall) are in a futex() syscall for a FUTEX_WAIT op. They are interruptible, i.e., a signal will effectively kill all threads of the process. Jakub, it seems that a couple of digits are missing from the BZ listed in your prior comment.
No, BZ#3270 in sourceware bugzilla, see External Bugzilla References.
Ah, got it, thanks. In case anyone else has trouble finding the External Bugzilla References section below :-), you can use this link: http://sources.redhat.com/bugzilla/show_bug.cgi?id=3270
We are not planning to fix this problem for Red Hat Enterprise Linux 4 There have been numerous fixes for setxid & pthread_join in Red Hat Enterprise Linux 5 & 6. However, #769852 is still open for Red Hat Enterprise Linux 5 (race condition can lead to hang in pthread_join after thread has called setuid). I expect this will be fixed in Red Hat Enterprise Linux 5.9.