Bug 240962 - Hangs and/or multithreaded process left Stopped (T) on CTRL-C of strace
Summary: Hangs and/or multithreaded process left Stopped (T) on CTRL-C of strace
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: strace
Version: rawhide
Hardware: ia64
OS: Linux
high
high
Target Milestone: ---
Assignee: Roland McGrath
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 222053
TreeView+ depends on / blocked
 
Reported: 2007-05-23 14:29 UTC by Jan Kratochvil
Modified: 2007-11-30 22:12 UTC (History)
0 users

Fixed In Version: 4.5.16-1.fc7
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2007-08-06 17:59:30 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
Testcase. (1016 bytes, text/plain)
2007-05-23 14:29 UTC, Jan Kratochvil
no flags Details
Bugfix. (1.20 KB, patch)
2007-05-23 14:40 UTC, Jan Kratochvil
no flags Details | Diff
Bugfix updated according to the Roland's comments. (1.69 KB, patch)
2007-05-24 09:40 UTC, Jan Kratochvil
no flags Details | Diff
Bugfix update #2 according to the Roland's comments. (1.67 KB, patch)
2007-05-24 15:08 UTC, Jan Kratochvil
no flags Details | Diff

Description Jan Kratochvil 2007-05-23 14:29:15 UTC
Description of problem:
strace sometimes hangs during detaching from a multithreaded application during
CTRL-C of strace itself.
As a sideeffects in some cases the multithreaded application gets Stopped (T, by
SIGSTOP) and needs to be sent `kill -CONT'.  Shell prints:
[1]+  Stopped                 appname args

Version-Release number of selected component (if applicable):
strace-4.5.15-1.el5.ia64
kernel-2.6.18-8.el5.ia64

How reproducible:
The application Stopped (T) state always - see also Bug 240961.
The strace hang best/only reproduced on ia64, in about 10% of testruns.

Steps to Reproduce:
1. gcc -o mt3-tkill mt3-tkill.c -Wall -ggdb2 -pthread
2. ./mt3-tkill
3. On other console: strace -o /tmp/x -f -p `pidof mt3-tkill'

Actual results:
Process 13968 attached with 64 threads - interrupt to quit
Process 13907 detached
...
Process 13905 detached
[HANG]

Expected results:
Process 13968 attached with 64 threads - interrupt to quit
Process 13907 detached
...
Process 13900 detached
[EXIT]

Additional info:
The process being traced gets into state:
/proc/12664/task/12664/status:State:	S (sleeping)
/proc/12664/task/12665/status:State:	T (tracing stop)
...
/proc/12664/task/12730/status:State:	T (tracing stop)
with STRACE in state:
#0  0xa000000000010641 in __kernel_syscall_via_break ()
#1  0x2000000000162fe0 in wait4 () from /lib/tls/libc.so.6.1
#2  0x4000000000008420 in detach (tcp=0x600000000001c050, sig=0) at
strace.c:1337
1337	  if (wait4(tcp->pid, &status, __WALL, NULL) < 0) {
#3  0x40000000000093b0 in cleanup () at strace.c:1516
#4  0x4000000000006ea0 in main (argc=6, argv=0x60000fffffff9f18) at
strace.c:803

- STRACE sends SIGSTOP to the process thread group leader but never receives it
back through wait4().

Testcase contains workaround of Linux kernel Bug leaking ERESTARTNOINTR to the
userland, it is present on some older Linux kernel variants around 2.6.9.  This
Linux kernel problem otherwise does not affect this Bug.

Comment 1 Jan Kratochvil 2007-05-23 14:29:15 UTC
Created attachment 155244 [details]
Testcase.

Comment 2 Jan Kratochvil 2007-05-23 14:40:51 UTC
Created attachment 155246 [details]
Bugfix.

Comment 3 Jan Kratochvil 2007-05-23 16:26:41 UTC
The problem occurs due to kill() may choose arbitrarily the target task of the
process group while we later wait just on one specific TID.
PID process waits become TID task specific waits for process under ptrace(2).
[ Roland McGrath originally provided this useful info. ]

Unfortunately the POSIX specification does not seem to mention this behavior:
        http://www.opengroup.org/onlinepubs/009695399/functions/kill.html
This paragraph talks only about kill (getpid (), ...):
        If the value of pid causes sig to be generated for the sending process,
        and if sig is not blocked for the calling thread and if no other thread
        has sig unblocked or is waiting in a sigwait() function for sig, either
        sig or at least one pending unblocked signal shall be delivered to the
        sending thread before kill() returns.


Comment 4 Jan Kratochvil 2007-05-24 09:40:30 UTC
Created attachment 155329 [details]
Bugfix updated according to the Roland's comments.

Comment 5 Jan Kratochvil 2007-05-24 15:08:16 UTC
Created attachment 155354 [details]
Bugfix update #2 according to the Roland's comments.

Comment 6 Jan Kratochvil 2007-08-03 11:59:35 UTC
Fixed in Rawhide strace-4.5.16-1.fc8:
* Fri Aug  3 2007 Roland McGrath <roland> - 4.5.16-1
- fix multithread issues (#240962, [...])

and upstream:

2007-05-24  Jan Kratochvil  <jan.kratochvil>

        * strace.c [LINUX] (my_tgkill): New macro.
        [LINUX] (detach): Use my_tgkill () instead of kill(2).
        Fixes RH#240962.


Comment 7 Fedora Update System 2007-08-06 17:58:57 UTC
strace-4.5.16-1.fc7 has been pushed to the Fedora 7 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.