Description of problem: strace sometimes hangs during detaching from a multithreaded application during CTRL-C of strace itself. As a sideeffects in some cases the multithreaded application gets Stopped (T, by SIGSTOP) and needs to be sent `kill -CONT'. Shell prints: [1]+ Stopped appname args Version-Release number of selected component (if applicable): strace-4.5.15-1.el5.ia64 kernel-2.6.18-8.el5.ia64 How reproducible: The application Stopped (T) state always - see also Bug 240961. The strace hang best/only reproduced on ia64, in about 10% of testruns. Steps to Reproduce: 1. gcc -o mt3-tkill mt3-tkill.c -Wall -ggdb2 -pthread 2. ./mt3-tkill 3. On other console: strace -o /tmp/x -f -p `pidof mt3-tkill' Actual results: Process 13968 attached with 64 threads - interrupt to quit Process 13907 detached ... Process 13905 detached [HANG] Expected results: Process 13968 attached with 64 threads - interrupt to quit Process 13907 detached ... Process 13900 detached [EXIT] Additional info: The process being traced gets into state: /proc/12664/task/12664/status:State: S (sleeping) /proc/12664/task/12665/status:State: T (tracing stop) ... /proc/12664/task/12730/status:State: T (tracing stop) with STRACE in state: #0 0xa000000000010641 in __kernel_syscall_via_break () #1 0x2000000000162fe0 in wait4 () from /lib/tls/libc.so.6.1 #2 0x4000000000008420 in detach (tcp=0x600000000001c050, sig=0) at strace.c:1337 1337 if (wait4(tcp->pid, &status, __WALL, NULL) < 0) { #3 0x40000000000093b0 in cleanup () at strace.c:1516 #4 0x4000000000006ea0 in main (argc=6, argv=0x60000fffffff9f18) at strace.c:803 - STRACE sends SIGSTOP to the process thread group leader but never receives it back through wait4(). Testcase contains workaround of Linux kernel Bug leaking ERESTARTNOINTR to the userland, it is present on some older Linux kernel variants around 2.6.9. This Linux kernel problem otherwise does not affect this Bug.
Created attachment 155244 [details] Testcase.
Created attachment 155246 [details] Bugfix.
The problem occurs due to kill() may choose arbitrarily the target task of the process group while we later wait just on one specific TID. PID process waits become TID task specific waits for process under ptrace(2). [ Roland McGrath originally provided this useful info. ] Unfortunately the POSIX specification does not seem to mention this behavior: http://www.opengroup.org/onlinepubs/009695399/functions/kill.html This paragraph talks only about kill (getpid (), ...): If the value of pid causes sig to be generated for the sending process, and if sig is not blocked for the calling thread and if no other thread has sig unblocked or is waiting in a sigwait() function for sig, either sig or at least one pending unblocked signal shall be delivered to the sending thread before kill() returns.
Created attachment 155329 [details] Bugfix updated according to the Roland's comments.
Created attachment 155354 [details] Bugfix update #2 according to the Roland's comments.
Fixed in Rawhide strace-4.5.16-1.fc8: * Fri Aug 3 2007 Roland McGrath <roland> - 4.5.16-1 - fix multithread issues (#240962, [...]) and upstream: 2007-05-24 Jan Kratochvil <jan.kratochvil> * strace.c [LINUX] (my_tgkill): New macro. [LINUX] (detach): Use my_tgkill () instead of kill(2). Fixes RH#240962.
strace-4.5.16-1.fc7 has been pushed to the Fedora 7 stable repository. If problems still persist, please make note of it in this bug report.