When traced threaded programs exit, the tracing program doesn't seem to notice. 20103 pts/18 S+ 0:00 | \_ strace -f /usr/sbin/openpbx -vvvvdnc 20128 pts/18 Zl+ 0:00 | \_ [openpbx] <defunct> 29765 pts/4 S 0:00 | \_ gdb --args /usr/sbin/openpbx -vvvvdnc 29766 pts/4 Zl+ 0:00 | \_ [openpbx] <defunct>
[New Thread 844313792 (LWP 9809)] -- Executing V110("mISDN/1-u1", "") in new stack [Thread 844313792 (LWP 9809) exited] reading register pc (#64): No such process. (gdb) c Continuing. reading register pc (#64): No such process. WTF?
Report omits kernel info.
Current FC6 kernel on the architecture indicated above: Linux pegasos.infradead.org 2.6.18-1.2741.fc6 #1 Wed Oct 4 20:18:10 EDT 2006 ppc ppc ppc GNU/Linux Also on ppc64 kernel. Here's what I see when 'strace -f' observes a threaded program exiting... [pid 9385] write(1, "Setting timer 268505240 for 5-se"..., 51Setting timer 268505240 for 5-second expiration... ) = 51 [pid 9385] timer_settime(0, 0, {it_interval={5, 0}, it_value={5, 0}}, NULL) = 0 [pid 9385] exit_group(1) = ? [pid 9386] --- SIGSTOP (Stopped (signal)) @ 0 (0) --- [pid 9386] SYS_300(0x30033510, 0xc, 0x30033508, 0x3003a940, 0x30033508) = 0 [pid 9386] rt_sigtimedwait([RTMIN], <unfinished ...> Process 9386 detached [1]+ Stopped strace -f ./sigev_thread [root@pegasos dwmw2]# kill -9 %1 [1]+ Killed strace -f ./sigev_thread
I can only test ppc64 kernels. I tried my vanilla 2.6.18+utrace ppc64 on an otherwise fc5 ppc/ppc64 installation, and strace -f worked fine. I still don't have your actual test case, so I used a trivial multithreaded program of my own. Please attach your test program source so I can try what you tried. I have a lot of downloading and installing to do before I can test that fc6 kernel, or test any kernel in an fc6 environment.
Created attachment 138320 [details] test program This is sufficient.
Ok, I did reproduce some weirdness with 2.6.18+utrace on ppc64 using your test. It looks like the bug is specifically with a group exit that should be killing many live threads, which did not happen in my trivial test.
Same problem on i386, it's not machine-specific, just this test case. I'll figure it out.
Created attachment 138377 [details] utrace fix This fixes the test case. I am also looking into other utrace interactions with SIGKILL.