Bug 456333

Summary: ptrace: PTRACE_DETACH(..., SIGSTOP) does not stop
Product: Red Hat Enterprise Linux 5 Reporter: Jan Kratochvil <jan.kratochvil>
Component: kernelAssignee: Oleg Nesterov <onestero>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: low    
Version: 5.2CC: ebachalo, kernel-maint, mgahagan, riek
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: All   
OS: Linux   
URL: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/detach-stopped.c?cvsroot=systemtap
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-02-04 18:36:05 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 454404, 674640, 674764    
Bug Blocks: 498595, 525215, 533192    
Attachments:
Description Flags
[patch] fix ptrace(PTRACE_DETACH, SIGSTOP) none

Description Jan Kratochvil 2008-07-22 22:32:48 UTC
+++ This bug was initially created as a clone of Bug #454404 +++

Description of problem:
While trying to detach a multithreaded program to remain T (stopped) some tasks
are left unstopped.
Singlethreaded program is left T (stopped) reliably.

Version-Release number of selected component (if applicable):
FAIL RHEL-5 kernel-2.6.18-92.1.6.el5.x86_64
FAIL RHEL-5 kernel-2.6.18-92.1.6.el5.i686
FAIL RHEL-5 kernel-2.6.18-53.el5.s390x

PASS RHEL-4 kernel-smp-2.6.9-67.0.20.EL.x86_64

How reproducible:
With #define THREADS 3 or more in fact always.

Steps to Reproduce:
wget -O detach-stopped.c
http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/detach-stopped.c?cvsroot=systemtap;
gcc -o detach-stopped detach-stopped.c -Wall -ggdb2 -pthread -D_GNU_SOURCE;
./detach-stopped; echo $?
 
Actual results:
1

Expected results:
0

Additional info:
There is a DEBUG for the tasks state dump.
It is a regression against RHEL-4.

Comment 1 RHEL Program Management 2008-07-25 17:01:15 UTC
This request was evaluated by Red Hat Product Management for
inclusion, but this component is not scheduled to be updated in
the current Red Hat Enterprise Linux release. If you would like
this request to be reviewed for the next minor release, ask your
support representative to set the next rhel-x.y flag to "?".

Comment 2 Ludek Smid 2008-07-25 21:55:16 UTC
Unfortunately the previous automated notification about the
non-inclusion of this request in Red Hat Enterprise Linux 5.3 used
the wrong text template. It should have read: this request has been
reviewed by Product Management and is not planned for inclusion
in the current minor release of Red Hat Enterprise Linux.

If you would like this request to be reviewed for the next minor
release, ask your support representative to set the next rhel-x.y
flag to "?" or raise an exception.

Comment 3 Jan Kratochvil 2008-07-29 08:50:38 UTC
It is a regression against RHEL-4.
It regresses issue 78487.


Comment 4 RHEL Program Management 2008-07-29 08:57:04 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 6 Eric Bachalo 2008-11-25 19:12:53 UTC
Pushing to RHEL 5.4, as this problem was not fixed for 5.3 release due to lower priority compared to other issues.

Comment 7 RHEL Program Management 2008-11-25 19:42:07 UTC
This bugzilla has Keywords: Regression.  

Since no regressions are allowed between releases, 
it is also being proposed as a blocker for this release.  

Please resolve ASAP.

Comment 10 RHEL Program Management 2009-02-16 15:41:32 UTC
Updating PM score.

Comment 16 Oleg Nesterov 2011-02-01 18:05:36 UTC
Jan, Roland,

Do we really want to fix this? This matches upstream.

Otoh, we are going to remove this extra wakeup sooner or later
(this is discussed on lkml right now), and rhel6 already differs
here.

I never understood rhel5's utrace code in details but at first
glance everything is clear and this behaviour is intentional,
ptrace_detach() has a huge comment before it clears
SIGNAL_STOP_STOPPED.

Confused.

Comment 17 Oleg Nesterov 2011-02-02 20:03:12 UTC
(In reply to comment #16)
>
> Jan, Roland,
> 
> Do we really want to fix this? This matches upstream.

Yes, but my initial analysis was wrong.
 
> Otoh, we are going to remove this extra wakeup sooner or later
> (this is discussed on lkml right now),

yes, this wrong wakeup can abort the group-stop, but this case
is unlikely, while the test-case always fails.

> I never understood rhel5's utrace code in details but at first
> glance everything is clear and this behaviour is intentional,
> ptrace_detach() has a huge comment before it clears
> SIGNAL_STOP_STOPPED.

No, I misread detach-stopped.c, there is something else. Still
investigating...

Comment 18 Oleg Nesterov 2011-02-02 22:28:46 UTC
Created attachment 476671 [details]
[patch] fix ptrace(PTRACE_DETACH, SIGSTOP)

Seems to fix the problem, but I'll try to think a bit more.

The problem is, ptrace_detach()->ptrace_induce_signal() does
utrace_inject_signal(action => UTRACE_ACTION_RESUME) and this
means that "add SIGNAL_STOP_DEQUEUED" logic never works.

I think it is safer to change utrace_get_signal() like this
patch does, if we want to fix this bug.

Comment 19 Oleg Nesterov 2011-02-03 22:59:36 UTC
[RHEL5 PATCH 1/1] bz456333: ptrace(PTRACE_DETACH, SIGSTOP) does not stop
http://post-office.corp.redhat.com/archives/rhkernel-list/2011-February/msg00225.html

Comment 20 Oleg Nesterov 2011-02-04 18:36:05 UTC
(In reply to comment #19)
> [RHEL5 PATCH 1/1] bz456333: ptrace(PTRACE_DETACH, SIGSTOP) does not stop
> http://post-office.corp.redhat.com/archives/rhkernel-list/2011-February/msg00225.html

It was decided we do not want to fix this:

   - it is not a regression

   - even today's kernel still behaves this way