Bug 216020 - waitpid returns WIFSIGNALED instead of WIFEXITED
Summary: waitpid returns WIFSIGNALED instead of WIFEXITED
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 6
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Roland McGrath
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks: 173278 216149 216150
TreeView+ depends on / blocked
 
Reported: 2006-11-16 19:47 UTC by Andrew Cagney
Modified: 2007-11-30 22:11 UTC (History)
5 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2007-09-04 20:32:21 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
test case (6.87 KB, text/plain)
2006-11-16 19:48 UTC, Andrew Cagney
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Sourceware 3525 0 None None None Never

Description Andrew Cagney 2006-11-16 19:47:57 UTC
Description of problem:

A detached daemon process, when it exits, returns WIFSIGNALED instead of
WIFEXITED (actually it also generates a WIFEXITED).

Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:

Run attached test case.
  
Actual results:

24081 installing signal handler and mask
24081 forking
24083 signalling 24081 with User defined signal 1
24083 waiting for signals
24081 calling waitpid 24082 for << child for daemon exits >> returns 0x0
WIFEXITED 0 -- ok
24081 waiting for signals
24081 received User defined signal 1
24081 calling ptrace 16 (ATTACH) 24083 0
24081 calling waitpid 24083 for << daemon attached >> returns 0x137f WIFSTOPPED
19 -- ok
24081 calling ptrace 21 (SETOPTIONS) 24083 80
24081 calling ptrace 7 (CONT) 24083 10
24081 calling waitpid 24083 for << daemon stops at exec >>24083 received User
defined signal 1
 returns 0x4057f WIFSTOPPED 5 -- ok
24081 calling ptrace 7 (CONT) 24083 0
24081 calling waitpid 24083 for << daemon stops at exit(47) >>24083 argc 3 0 47
exiting 24083 with 47
 returns 0x6057f WIFSTOPPED 5 -- ok
24081 calling ptrace 7 (CONT) 24083 0
24081 calling waitpid 24083 for << daemon does exit(47) >> returns 0x2f
WIFSIGNALED 47 -- WIFEXITED 47 expected
Aborted (core dumped)

Expected results:

30389 installing signal handler and mask
30389 forking
30391 signalling 30389 with User defined signal 1
30391 waiting for signals
30389 calling waitpid 30390 for << child for daemon exits >> returns 0x0
WIFEXITED 0 -- ok
30389 waiting for signals
30389 received User defined signal 1
30389 calling ptrace 16 (ATTACH) 30391 0
30389 calling waitpid 30391 for << daemon attached >> returns 0x137f WIFSTOPPED
19 -- ok30389 calling ptrace 21 (SETOPTIONS) 30391 80
30389 calling ptrace 7 (CONT) 30391 10
30389 calling waitpid 30391 for << daemon stops at exec >>30391 received User
defined signal 1
 returns 0x4057f WIFSTOPPED 5 -- ok
30389 calling ptrace 7 (CONT) 30391 0
30389 calling waitpid 30391 for << daemon stops at exit(47) >>30391 argc 3 0 47
exiting 30391 with 47
 returns 0x6057f WIFSTOPPED 5 -- ok
30389 calling ptrace 7 (CONT) 30391 0
30389 calling waitpid 30391 for << daemon does exit(47) >> returns 0x2f00
WIFEXITED 47 -- ok
30389 calling waitpid 30391 for << no children >> fails (No child processes) -- ok

Additional info:

Will try to further reduce test, see frysk's testsuite for current version.

Comment 1 Andrew Cagney 2006-11-16 19:48:10 UTC
Created attachment 141405 [details]
test case

Comment 3 Andrew Cagney 2006-11-16 20:13:10 UTC
The exec isn't needed, nor is SETOPTIONS, both removed from frysk's test case.


Comment 4 Andrew Cagney 2006-11-16 20:47:44 UTC
$ ssh towns uname -a
Linux towns.toronto.redhat.com 2.6.18-1.2849.fc6 #1 SMP Fri Nov 10 12:36:14 EST
2006 i686 i686 i386 GNU/Linux


Comment 5 Roland McGrath 2006-11-17 08:30:13 UTC
The test case is slightly racy and on a well-behaving kernel it will sometimes
barf when it sees a WIFSTOPPED/WTERMSIG=SIGUSR1 result, which is a valid
possibility when the parent's PTRACE_CONT with SIGUSR1 comes before the child
has unblocked SIGUSR1 in the sigsuspend syscall; this will happen on a fast
machine or a preemption kernel.  The ptrace behavior is to requeue the
PTRACE_CONT-specified signal when it's blocked, so that it comes back through
ptrace when it's unblocked--whereas when the signal is already unblocked because
the child has gotten far enough into its sigsuspend call, then PTRACE_CONT's
injection of the signal goes directly to running the handler as the test case
expects.  

The only mystery about the utrace bug is how we went so long without noticing
it.  The waitpid result for the exit of a PTRACE_ATTACH'd task (not your own
child) is just always wrong.  It's a simple fix.

Comment 6 Andrew Cagney 2006-11-17 14:34:15 UTC
(In reply to comment #5)
> The test case is slightly racy and on a well-behaving kernel it will sometimes
> barf when it sees a WIFSTOPPED/WTERMSIG=SIGUSR1 result, which is a valid
> possibility when the parent's PTRACE_CONT with SIGUSR1 comes before the child
> has unblocked SIGUSR1 in the sigsuspend syscall

I wondered about that, I'll slow down the test, tks.

Comment 7 Roland McGrath 2007-09-04 20:32:21 UTC
This bug was fixed quite a while ago.  Reopen if it reoccurs.


Note You need to log in before you can comment on or make changes to this bug.