Bug 209873 - broken strace/gdb of threaded programs
broken strace/gdb of threaded programs
Status: CLOSED RAWHIDE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
medium Severity medium
: ---
: ---
Assigned To: Roland McGrath
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-10-07 05:42 EDT by David Woodhouse
Modified: 2007-11-30 17:11 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2006-10-28 01:40:03 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
test program (2.94 KB, text/x-csrc)
2006-10-12 06:12 EDT, David Woodhouse
no flags Details
utrace fix (340 bytes, patch)
2006-10-12 17:01 EDT, Roland McGrath
no flags Details | Diff

  None (edit)
Description David Woodhouse 2006-10-07 05:42:56 EDT
When traced threaded programs exit, the tracing program doesn't seem to notice.


20103 pts/18   S+     0:00  |           \_ strace -f /usr/sbin/openpbx -vvvvdnc
20128 pts/18   Zl+    0:00  |               \_ [openpbx] <defunct>

29765 pts/4    S      0:00  |           \_ gdb --args /usr/sbin/openpbx -vvvvdnc
29766 pts/4    Zl+    0:00  |               \_ [openpbx] <defunct>
Comment 1 David Woodhouse 2006-10-11 10:03:56 EDT
[New Thread 844313792 (LWP 9809)]
    -- Executing V110("mISDN/1-u1", "") in new stack
[Thread 844313792 (LWP 9809) exited]
reading register pc (#64): No such process.
(gdb) c
Continuing.
reading register pc (#64): No such process.

WTF?
Comment 2 Roland McGrath 2006-10-11 23:11:40 EDT
Report omits kernel info.
Comment 3 David Woodhouse 2006-10-12 02:12:43 EDT
Current FC6 kernel on the architecture indicated above:

Linux pegasos.infradead.org 2.6.18-1.2741.fc6 #1 Wed Oct 4 20:18:10 EDT 2006 ppc
ppc ppc GNU/Linux

Also on ppc64 kernel. Here's what I see when 'strace -f' observes a threaded
program exiting...

[pid  9385] write(1, "Setting timer 268505240 for 5-se"..., 51Setting timer
268505240 for 5-second expiration...
) = 51
[pid  9385] timer_settime(0, 0, {it_interval={5, 0}, it_value={5, 0}}, NULL) = 0
[pid  9385] exit_group(1)               = ?
[pid  9386] --- SIGSTOP (Stopped (signal)) @ 0 (0) ---
[pid  9386] SYS_300(0x30033510, 0xc, 0x30033508, 0x3003a940, 0x30033508) = 0
[pid  9386] rt_sigtimedwait([RTMIN], 


 <unfinished ...>
Process 9386 detached

[1]+  Stopped                 strace -f ./sigev_thread
[root@pegasos dwmw2]# kill -9 %1
[1]+  Killed                  strace -f ./sigev_thread


Comment 4 Roland McGrath 2006-10-12 05:22:04 EDT
I can only test ppc64 kernels.  I tried my vanilla 2.6.18+utrace ppc64 on an
otherwise fc5 ppc/ppc64 installation, and strace -f worked fine.  I still don't
have your actual test case, so I used a trivial multithreaded program of my own.
Please attach your test program source so I can try what you tried.

I have a lot of downloading and installing to do before I can test that fc6
kernel, or test any kernel in an fc6 environment.
Comment 5 David Woodhouse 2006-10-12 06:12:46 EDT
Created attachment 138320 [details]
test program

This is sufficient.
Comment 6 Roland McGrath 2006-10-12 06:23:35 EDT
Ok, I did reproduce some weirdness with 2.6.18+utrace on ppc64 using your test.
It looks like the bug is specifically with a group exit that should be killing
many live threads, which did not happen in my trivial test.
Comment 7 Roland McGrath 2006-10-12 06:45:16 EDT
Same problem on i386, it's not machine-specific, just this test case.
I'll figure it out.
Comment 8 Roland McGrath 2006-10-12 17:01:10 EDT
Created attachment 138377 [details]
utrace fix

This fixes the test case.  I am also looking into other utrace interactions
with SIGKILL.

Note You need to log in before you can comment on or make changes to this bug.