Bug 209873 - broken strace/gdb of threaded programs
Summary: broken strace/gdb of threaded programs
Keywords:
Status: CLOSED RAWHIDE
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: rawhide
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Roland McGrath
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2006-10-07 09:42 UTC by David Woodhouse
Modified: 2007-11-30 22:11 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2006-10-28 05:40:03 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
test program (2.94 KB, text/x-csrc)
2006-10-12 10:12 UTC, David Woodhouse
no flags Details
utrace fix (340 bytes, patch)
2006-10-12 21:01 UTC, Roland McGrath
no flags Details | Diff

Description David Woodhouse 2006-10-07 09:42:56 UTC
When traced threaded programs exit, the tracing program doesn't seem to notice.


20103 pts/18   S+     0:00  |           \_ strace -f /usr/sbin/openpbx -vvvvdnc
20128 pts/18   Zl+    0:00  |               \_ [openpbx] <defunct>

29765 pts/4    S      0:00  |           \_ gdb --args /usr/sbin/openpbx -vvvvdnc
29766 pts/4    Zl+    0:00  |               \_ [openpbx] <defunct>

Comment 1 David Woodhouse 2006-10-11 14:03:56 UTC
[New Thread 844313792 (LWP 9809)]
    -- Executing V110("mISDN/1-u1", "") in new stack
[Thread 844313792 (LWP 9809) exited]
reading register pc (#64): No such process.
(gdb) c
Continuing.
reading register pc (#64): No such process.

WTF?

Comment 2 Roland McGrath 2006-10-12 03:11:40 UTC
Report omits kernel info.

Comment 3 David Woodhouse 2006-10-12 06:12:43 UTC
Current FC6 kernel on the architecture indicated above:

Linux pegasos.infradead.org 2.6.18-1.2741.fc6 #1 Wed Oct 4 20:18:10 EDT 2006 ppc
ppc ppc GNU/Linux

Also on ppc64 kernel. Here's what I see when 'strace -f' observes a threaded
program exiting...

[pid  9385] write(1, "Setting timer 268505240 for 5-se"..., 51Setting timer
268505240 for 5-second expiration...
) = 51
[pid  9385] timer_settime(0, 0, {it_interval={5, 0}, it_value={5, 0}}, NULL) = 0
[pid  9385] exit_group(1)               = ?
[pid  9386] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid  9386] SYS_300(0x30033510, 0xc, 0x30033508, 0x3003a940, 0x30033508) = 0
[pid  9386] rt_sigtimedwait([RTMIN], 


 <unfinished ...>
Process 9386 detached

[1]+  Stopped                 strace -f ./sigev_thread
[root@pegasos dwmw2]# kill -9 %1
[1]+  Killed                  strace -f ./sigev_thread




Comment 4 Roland McGrath 2006-10-12 09:22:04 UTC
I can only test ppc64 kernels.  I tried my vanilla 2.6.18+utrace ppc64 on an
otherwise fc5 ppc/ppc64 installation, and strace -f worked fine.  I still don't
have your actual test case, so I used a trivial multithreaded program of my own.
Please attach your test program source so I can try what you tried.

I have a lot of downloading and installing to do before I can test that fc6
kernel, or test any kernel in an fc6 environment.

Comment 5 David Woodhouse 2006-10-12 10:12:46 UTC
Created attachment 138320 [details]
test program

This is sufficient.

Comment 6 Roland McGrath 2006-10-12 10:23:35 UTC
Ok, I did reproduce some weirdness with 2.6.18+utrace on ppc64 using your test.
It looks like the bug is specifically with a group exit that should be killing
many live threads, which did not happen in my trivial test.

Comment 7 Roland McGrath 2006-10-12 10:45:16 UTC
Same problem on i386, it's not machine-specific, just this test case.
I'll figure it out.

Comment 8 Roland McGrath 2006-10-12 21:01:10 UTC
Created attachment 138377 [details]
utrace fix

This fixes the test case.  I am also looking into other utrace interactions
with SIGKILL.


Note You need to log in before you can comment on or make changes to this bug.