Bug 241665
Summary: | mtrace() deadlock in a multi-threaded environment | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Olivier Baudron <olivier.baudron> |
Component: | glibc | Assignee: | Jakub Jelinek <jakub> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | Brian Brock <bbrock> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6 | ||
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | F8 | Doc Type: | Bug Fix |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2007-06-22 17:12:15 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Olivier Baudron
2007-05-29 13:41:19 UTC
Can't reproduce this, with the same glibc, on quad core x86_64. Tried both -m32 and -m64 builds, and running the proglet both on all CPUs and with taskset -c 0,2 ./testcase. In each case several thousands of invocations without a hang. Can you please install glibc-debuginfo for you arch, attach with gdb to it and get all backtraces? I forgot to mention that you need to set MALLOC_TRACE: $ export MALLOC_TRACE=trace.txt $ ./testcase Two process hangs: $ ps x 23752 pts/4 Sl+ 0:00 ./testcase 23849 pts/4 S+ 0:00 ./testcase For process 23752: ------------------ (gdb) info threads 2 Thread -1669952624 (LWP 23841) 0x00672402 in __kernel_vsyscall () 1 Thread -1208396096 (LWP 23752) 0x00672402 in __kernel_vsyscall () (gdb) thread 1 (gdb) bt #0 0x00672402 in __kernel_vsyscall () #1 0x00da84b7 in pthread_join (threadid=2625014672, thread_return=0x0) at pthread_join.c:89 #2 0x0804874c in main () at testcase.c:52 (gdb) thread 2 (gdb) bt #0 0x00672402 in __kernel_vsyscall () #1 0x00daecbb in ?? () from /lib/libpthread.so.0 #2 0x08048687 in new_thread (param=0x0) at testcase.c:29 #3 0x00da73db in start_thread (arg=0x9c768b90) at pthread_create.c:296 #4 0x001ce26e in ?? () from /lib/libc.so.6 For process 23849: ------------------ (gdb) bt #0 0x00672402 in __kernel_vsyscall () #1 0x001dab6e in ?? () from /lib/libc.so.6 #2 0x0016c717 in ?? () at mtrace.c:188 from /lib/libc.so.6 Previous frame identical to this frame (corrupt stack?) Is someone able to reproduce the bug? I think I have found the problem. mtrace() uses a mutex "lock" to write in the output text file. When "lock" is being acquired while another thread fork() the child inherits a locked (or inconsistent mutex) that cannot be unlocked. Therefore the child waits forever when execve() calls malloc() because the lock cannot be acquired to write in the output text file. One solution is to use pthread_atfork(). Also, I retried the testcase on a single CPU and this time the stack is not corrupted. In each child it locks with the following stack: (gdb) bt #0 0xb7fcc410 in __kernel_vsyscall () #1 0x41686b6e in __lll_mutex_lock_wait () from /lib/libc.so.6 #2 0x41618717 in _L_lock_858 () at mtrace.c:188 #3 0x416185fc in tr_mallochook (size=137, caller=0x4163b5b5) at mtrace.c:182 #4 0x41614e85 in *__GI___libc_malloc (bytes=137) at malloc.c:3463 #5 0x4163b5b5 in *__GI_execvp (file=0x804885f "echo", argv=0x7f745388) at execvp.c:108 #6 0x4163b8be in *__GI_execlp (file=0x804885f "echo", arg=0x804885f "echo") at execlp.c:67 #7 0x08048657 in new_thread (param=0x0) at foo.c:25 #8 0x417203db in start_thread (arg=0x7f746b90) at pthread_create.c:296 #9 0x4167a26e in clone () from /lib/libc.so.6 I hope this helps. In any case, any advice is welcome. mtrace is inherently thread unsafe due to the use of malloc hooks. Plus many of the __libc_freeres hooks aren't thread safe either. While valgrind stops all threads before calling __libc_freeres, mtrace doesn't. In summary, for multithreaded programs use valgrind instead of mtrace. Fixed. |