Bug 481753

Summary: ptraced processes fail to deliver exit notification to parent
Product: [Fedora] Fedora Reporter: Denys Vlasenko <dvlasenk>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: high    
Version: 11CC: anton, dzickus, kernel-maint, onestero, quintela, roland
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.29.5-191.fc11 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-06-24 19:23:30 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Test program
none
Example of (an older) testcase run under strace.
none
debug kernel patch: ptrace_detach() can leak a zombie? none

Description Denys Vlasenko 2009-01-27 13:50:40 UTC
This bug was found while working on strace bug 478419.

Build attached test program with

gcc -Os test_fork.c -otest_fork -lpthread

and run it like this:

#!/bin/sh
while sleep 0.1; do
        strace -f -o log ./test_fork
        echo strace finished
done

When run under strace -f, it sometimes hangs in waitpid() call, despite the fact that child it is waiting for has exited.

Sometimes, the same happens in strace. When traced program exits via group_exit() (you will see "EXIT_FLAG" printed - see source), strace would normally detect it by getting exit notification via wait4(). In this case, sometimes this exit notification is lost and strace hangs in wait4(), waiting for an event which will never appear - exit notification is lost, and there are no other sources of events left.

This happens on Fedora kernel 2.6.29-0.28.rc1.fc11.x86_64 too, but not on vanilla 2.6.28.2.

Comment 1 Denys Vlasenko 2009-01-27 13:51:21 UTC
Created attachment 330087 [details]
Test program

Comment 2 Oleg Nesterov 2009-01-27 16:52:41 UTC
Can't reproduce on rhel 5.3

Denys, could you verify that all sub-threads have exited?

It could be the bug in strace (or in kernel, of course).

> EXIT_FLAG
>
> so, obviously, program reached exit(). But strace did not exit!

Perhaps not. Yes, the process exited, but the thread group can't
be reaped until all sub-threads have exited, and the traced
sub-thread can't "disappear" automatically (as it normally
happens) because it is traced.

Comment 3 Denys Vlasenko 2009-01-27 18:25:45 UTC
Ok, I was wrong. RHEL does not have this bug, but Fedora does, specifically, kernel-2.6.29-0.28.rc1.fc11.

I will correct Product/version fields now.

Comment 4 Denys Vlasenko 2009-01-27 18:33:17 UTC
Created attachment 330135 [details]
Example of (an older) testcase run under strace.

Strace itself was straced too. You can see how strace PTRACE_CONTed the tracee which was stopped at the entry to exit_group(), but subsequent wait4() nevertheless does not pick this process:


wait4(4294967295, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 4376
write(2, " [wait(0x857f) = 4376]\n", 23) = 23
wait4(4294967295, 0x7fff77bf5c9c, WNOHANG|__WALL, NULL) = 0
write(2, "pid 4376 stopped, [133]\n", 24) = 24
ptrace(PTRACE_PEEKUSER, 4376, 8*ORIG_RAX, [0xe7]) = 0
ptrace(PTRACE_PEEKUSER, 4376, 8*CS, [0x33]) = 0
ptrace(PTRACE_PEEKUSER, 4376, 8*RAX, [0xffffffffffffffda]) = 0
ptrace(PTRACE_PEEKUSER, 4376, 8*RDI, [0]) = 0
write(3, "4376  exit_group(0)             "..., 44) = 44
ptrace(PTRACE_CONT, 4376, 0x1, SIG_0)   = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
write(2, "PTRACE_CONTed 4376\n", 19)    = 19
wait4(4294967295,

Comment 5 Denys Vlasenko 2009-01-27 18:38:20 UTC
Task 4376 has two threads:

# ls -1 /proc/4376/task/
4376
4434

and looking into strace's strace, latest events with thread 4434 are:

wait4(4294967295, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 4434
write(2, " [wait(0x857f) = 4434]\n", 23) = 23
wait4(4294967295, 0x7fff77bf5c9c, WNOHANG|__WALL, NULL) = 0
write(2, "pid 4434 stopped, [133]\n", 24) = 24
ptrace(PTRACE_PEEKUSER, 4434, 8*ORIG_RAX, [0x3c]) = 0
ptrace(PTRACE_PEEKUSER, 4434, 8*CS, [0x33]) = 0
ptrace(PTRACE_PEEKUSER, 4434, 8*RAX, [0xffffffffffffffda]) = 0
ptrace(PTRACE_PEEKUSER, 4434, 8*RDI, [0]) = 0
write(3, "4434  _exit(0)                  "..., 44) = 44
ptrace(PTRACE_DETACH, 4434, 0x1, SIG_0) = 0
--- SIGCHLD (Child exited) @ 0 (0) ---
wait4(4294967295, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 4436

After this, we never see 4434 mentioned again. So it should disappear (since it's a thread and threads do not become zombies). But it didn't.

Comment 6 Denys Vlasenko 2009-01-27 19:46:42 UTC
Oleg, per our discussion:

Having no timestamps in the log, I can't prove that _exit() in thread 4434 couldn't race with exit_group() in thread 4376, but it's very unlikely since
_exit is on line 9913 and
exit_group is on line 35149 of the log.

Looks like _exit() in a thread sometimes doesn't work properly when it is launched by PTRACE_DETACH.

Comment 7 Oleg Nesterov 2009-01-27 19:54:18 UTC
(In reply to comment #5)
>
> Task 4376 has two threads:
>
> # ls -1 /proc/4376/task/
> 4376
> 4434

OK, so the hang in do_wait() is correct.

> and looking into strace's strace, latest events with thread 4434 are:
>
> wait4(4294967295, [{WIFSTOPPED(s) && WSTOPSIG(s) == 133}], __WALL, NULL) = 434
> write(2, " [wait(0x857f) = 4434]\n", 23) = 23
> wait4(4294967295, 0x7fff77bf5c9c, WNOHANG|__WALL, NULL) = 0
> write(2, "pid 4434 stopped, [133]\n", 24) = 24
> ptrace(PTRACE_PEEKUSER, 4434, 8*ORIG_RAX, [0x3c]) = 0
> ptrace(PTRACE_PEEKUSER, 4434, 8*CS, [0x33]) = 0
> ptrace(PTRACE_PEEKUSER, 4434, 8*RAX, [0xffffffffffffffda]) = 0
> ptrace(PTRACE_PEEKUSER, 4434, 8*RDI, [0]) = 0
> write(3, "4434  _exit(0)                  "..., 44) = 44
> ptrace(PTRACE_DETACH, 4434, 0x1, SIG_0) = 0

... and note that PTRACE_DETACH returns success, this sub-thread
was really stopped when strace entered sys_ptrace()

> After this, we never see 4434 mentioned again. So it should disappear (since
> it's a thread and threads do not become zombies). But it didn't.

Yep. Looks like we have the kernel bug, and now I don't understand
why it is not reproducible on upstream.

Currently I am not sure I understand what happens, just a guess.
We can race with the main thread which does group_exit() and sends
SIGKILL to sub-thread.

The sub-thread T is stopped, the tracer S does PTRACE_DETACH, and
T's group leader L does group_exit().

S enters sys_ptrace(PTRACE_DETACH), sees T->state == TASK_TRACED
and calls ptrace_detach().

L does group_exit(), sends SIGKILL to T and wakes up it.

T does do_exit()->exit_notify(). It notices ->ptrace != 0, so it
doesn't autoreap itself, but sets ->exit_state = TASK_ZOMBIE.

S does __ptrace_detach() and finally untraces this thread.


Now, nobody can reap T: it is not visible to do_wait() because
its ->exit_signal == -1. And L is not eligible_child() because
thread_group_empty() == F.

More. Even if we kill the original parent (strace in this case),
/sbin/init can't reap this thread group too for the same reason.

Comment 8 Oleg Nesterov 2009-01-27 20:22:17 UTC
(In reply to comment #6)
> 
> Having no timestamps in the log, I can't prove that _exit() in thread 4434
> couldn't race with exit_group() in thread 4376, but it's very unlikely since
> _exit is on line 9913 and
> exit_group is on line 35149 of the log.

Agreed.

> Looks like _exit() in a thread sometimes doesn't work properly when it is
> launched by PTRACE_DETACH.

I have to re-check this, but looks like utrace make things a bit
worse in this respect. We do not need to race with SIGKILL.

This tracee does exit(), yes? Then, unless I missed something,
ptrace_detach_task() may cause utrace_wakeup(). In that case
T proceeds and passes exit_notify() before we untrace it.

So we have the same race in effect.

But again, unless I missed something.

Comment 9 Oleg Nesterov 2009-01-27 21:23:13 UTC
Created attachment 330158 [details]
debug kernel patch: ptrace_detach() can leak a zombie?

I tried to install kernel-2.6.29-0.28.rc1.fc11, but failed.
The rhts machine is lost after reboot.

Until I find the way to do something else, perhaps someone
can re-test with this patch. The patch is not complete and
untested, just to check the theory.

Comment 11 Oleg Nesterov 2009-02-02 20:35:21 UTC
To summarize.

This is upstream bug, but it is more "visible" with redhat kernels.

The patches were sent upstream, currently there are in -mm tree.
Denys confirms (thanks!) they really fix the problem.

Comment 12 Denys Vlasenko 2009-02-03 14:34:49 UTC
Created a testcase for it, detach-sigkill-race:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/detach-sigkill-race.c?cvsroot=systemtap

Comment 13 Chuck Ebbert 2009-02-10 07:16:21 UTC
Oleg, do you expect those patches to be merged in 2.6.29?

Comment 14 Oleg Nesterov 2009-02-10 17:37:19 UTC
(In reply to comment #13)
>
> Oleg, do you expect those patches to be merged in 2.6.29?

No, there are targeted for 2.6.30. Too late for 2.6.29.

Comment 15 Chuck Ebbert 2009-04-11 00:51:50 UTC
(In reply to comment #14)
> (In reply to comment #13)
> >
> > Oleg, do you expect those patches to be merged in 2.6.29?
> 
> No, there are targeted for 2.6.30. Too late for 2.6.29.

Are they merged now? If so can you get the commit IDs?

Comment 16 Oleg Nesterov 2009-04-11 11:51:55 UTC
(in reply to comment #15)
>
> Are they merged now?

Yes.

> If so can you get the commit IDs?

6d69cb87f05eef3b02370b2f7bae608ad2301a00
b1b4c6799fb59e710454bfe0ab477cb8523a8667
95c3eb76dc07fd81289888ffc42948196b34b444
4576145c1ecdaaea9ef8976a48335206aa1ebf91

Comment 17 Chuck Ebbert 2009-06-01 17:55:53 UTC
This bug is really hard to reproduce -- I got one failure in 7000 test runs on one machine so far.

Comment 18 Roland McGrath 2009-06-01 18:32:01 UTC
Do you mean 7000 runs of detach-sigkill-race?  It runs 2000 iterations.
Denys, can you tune detach-sigkill-race so it is more likely to hit the problem on an affected kernel?

Comment 19 Chuck Ebbert 2009-06-04 06:52:57 UTC
(In reply to comment #18)
> Do you mean 7000 runs of detach-sigkill-race?  It runs 2000 iterations.

On a quad-core machine I was only able to get one failure in thousands of runs of detach-sigkill-race. On a dual-core it happened after ~100 runs.

Comment 20 Chuck Ebbert 2009-06-09 03:27:00 UTC
fixed in 2.6.29.4-173

Comment 21 Fedora Update System 2009-06-17 11:53:29 UTC
kernel-2.6.29.5-191.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/kernel-2.6.29.5-191.fc11

Comment 22 Fedora Update System 2009-06-19 13:44:03 UTC
kernel-2.6.29.5-191.fc11 has been pushed to the Fedora 11 testing repository.  If problems still persist, please make note of it in this bug report.
 If you want to test the update, you can install it with 
 su -c 'yum --enablerepo=updates-testing update kernel'.  You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-6768

Comment 23 Fedora Update System 2009-06-24 19:22:37 UTC
kernel-2.6.29.5-191.fc11 has been pushed to the Fedora 11 stable repository.  If problems still persist, please make note of it in this bug report.