Description of problem:
If a thread makes a setuid() call and then terminates and another threads
attempts to call pthread_join() on it a deadlock occurs.
Version-Release number of selected component (if applicable):
Not sure whether this is glibc or kernel, I tested on these two combinations:
2.6.9-42.0.10.ELsmp and glibc-2.3.4-2.25
This also seems to affect RedHat 5 but I don't have the exact system configs
right now (can verify if needed)
I have a small (53 line) demo program.
Steps to Reproduce:
1. gcc -pthreads ths.c
launched threads, iter 0
launched threads, iter 0
launched threads, iter 1
I'm not sure why would anybody call setuid() in a thread, but hey -- it's a bug.
Created attachment 159495 [details]
I've reproduced this problem with stock RHEL4.5 and also with a recent
interim build (2.6.9-55.14.EL) RHEL4.6-under-development kernel (since
there was a futex fix applied earlier in U6).
I think the problem lies in the thread clean-up handling in glibc, and
thus I'm reassigning this BZ appropriately. The setuid() calls in each
thread all succeed (if run as root) or all fail (if run as non-root),
but it seems that one (or sometimes two or three) thread(s) never fully
exit. They complete execution of the thread function, but the pthread_kill()
function from the parent still finds them (i.e., the call returns 0 instead
of ESRCH) and a subsequent pthread_join() would wait indefinitely. Note that
the call to pthread_kill() is made with a signal arg of 0, which does not
actually kill the thread (which is intentional).
I suspect that a pthread_kill() racing with a setuid() might be at the heart
of this problem, only because changing the setuid() to several other syscalls
makes the problem unreproducible.
I will attach a modified version of the reproducer, which contains some added
Created attachment 159623 [details]
modified version of reproducer referred to above
Found what sounds like a dup of this, BZ#3270.
I forgot to mention that the thread(s) that get stuck (following execution
of their setuid() syscall) are in a futex() syscall for a FUTEX_WAIT op.
They are interruptible, i.e., a signal will effectively kill all threads
of the process.
Jakub, it seems that a couple of digits are missing from the BZ listed
in your prior comment.
No, BZ#3270 in sourceware bugzilla, see External Bugzilla References.
Ah, got it, thanks.
In case anyone else has trouble finding the External Bugzilla References
section below :-), you can use this link:
We are not planning to fix this problem for Red Hat Enterprise Linux 4
There have been numerous fixes for setxid & pthread_join in Red Hat Enterprise Linux 5 & 6. However, #769852 is still open for Red Hat Enterprise Linux 5 (race condition can lead to hang in pthread_join after thread has called setuid). I expect this will be fixed in Red Hat Enterprise Linux 5.9.