Bug 177240

Summary: ptraced multithreaded exec dies with spurious SIGKILL
Product: [Fedora] Fedora Reporter: Roland McGrath <roland>
Component: kernelAssignee: Dave Jones <davej>
Status: CLOSED CURRENTRELEASE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: 5CC: cagney, kernel-maint, pfrields, scox, wtogami, zhouwu
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 2.6.16-1.2096_FC5 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2006-10-12 10:34:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 150222, 173278, 180484    
Attachments:
Description Flags
threaded execer program for test case none

Description Roland McGrath 2006-01-08 03:19:51 UTC
Description of problem:


Version-Release number of selected component (if applicable):
2.6.14-1.1773_FC5smp, 2.6.15-1.1826.2.4_FC5

How reproducible:
100%

Steps to Reproduce:
1. gcc -o threadexec -g threadexec.c -lpthread
2. ./threadexec /bin/echo hi => see it work
3. strace -f -o log ./threadexec /bin/echo hi => see it fail
  
Actual results:

In the traced run, the process dies by SIGKILL immediately after execing /bin/echo.

Expected results:

After the exec, /bin/echo should run normally (and be traced).

Additional info:

Comment 1 Roland McGrath 2006-01-08 03:19:52 UTC
Created attachment 122915 [details]
threaded execer program for test case

Comment 2 Roland McGrath 2006-01-08 09:25:03 UTC
Verified this bug is in the current upstream kernel.

Comment 3 wzhou 2006-03-02 09:11:15 UTC
I believe I had found the code which trigger this testcase failure.  But I don't
the why yet.  It is triggered by the following code in kernel/ptrace.c:

+	if (child->signal->flags & SIGNAL_GROUP_EXIT) {
+		sigaddset(&child->pending.signal, SIGKILL);
+		signal_wake_up(child, 1);
+	}

This is checked into 2.6.15 by Andrea Arcangeli to fix a gdb deadlock problem he
didn't elaborate on. Here is the link:
http://marc.theaimsgroup.com/?l=linux-kernel&m=112833915827432&w=2.  After
reversing this patch, the above case works ok on both x86 and ppc64.

Comment 4 wzhou 2006-03-02 09:17:25 UTC
BTW, could I use the above testcase in the public mail-list?  I believe it is. 
But I am not sure.  So I just want to ask.  :-)

Comment 5 Roland McGrath 2006-03-02 09:25:18 UTC
Everything about this bug is public.  The attached test program is GPL'd code
already publically available from the Frysk project.

Thanks for looking into this.  Andrea's change is one among many recent kernel
changes on my backlog to review and about which I have been dubious but not yet
had the time to investigate fully and follow up to get them changed. 
Unfortunately I have still had my time monopolized by other things and that
backlog is growing and not shrinking.

Comment 6 Andrew Cagney 2006-03-02 19:55:47 UTC
Can the above code be removed for the FC-5 kernel?


Comment 7 wzhou 2006-03-08 04:09:33 UTC
I sent an email to Andrea and cc the linux-kernel mail-list about this problem.
 Here is the link:

http://marc.theaimsgroup.com/?l=linux-kernel&m=114178963330524&w=2

Just FYI.

Comment 8 Roland McGrath 2006-04-17 21:25:43 UTC
2.6.16.6 has fixed this upstream.
FC-5's next rebase should get it.

Comment 9 Roland McGrath 2006-10-12 10:34:47 UTC
 2.6.17-1.2174_FC5 tests fine, closing