Bug 464520 - [utrace] PTRACE_O_TRACEVFORK no longer gives parent notification
Summary: [utrace] PTRACE_O_TRACEVFORK no longer gives parent notification
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: kernel
Version: 10
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Roland McGrath
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-09-29 14:30 UTC by Tom Horsley
Modified: 2009-12-18 06:27 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-18 06:27:12 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
test-vfork.c source file (4.84 KB, text/plain)
2008-09-29 14:32 UTC, Tom Horsley
no flags Details
Modified test program (PTRACE_CONTs traced processes) (5.15 KB, text/plain)
2008-10-02 12:18 UTC, Denys Vlasenko
no flags Details
Patch to ptrace-tests to check for this bug. (5.21 KB, patch)
2008-10-02 14:07 UTC, Denys Vlasenko
no flags Details | Diff
Modified test-vfork program (6.36 KB, text/plain)
2008-10-27 13:41 UTC, Tom Horsley
no flags Details

Description Tom Horsley 2008-09-29 14:30:00 UTC
Description of problem:

With latest 2.6.26 kernel on fedora 8, using ptrace() to follow fork/vfork/clone
I no longer see the parent PID stop when it calls vfork(). On previous kernels
I would get the child process stopped with SIGSTOP and the parent process
stopped with SIGTRAP (in some random order). With new kernel, no amount
of waiting ever results in the parent SIGTRAP notification for a vfork().


Version-Release number of selected component (if applicable):
Linux tomh 2.6.26.3-14.fc8 #1 SMP Wed Sep 3 03:40:05 EDT 2008 i686 i686 i386 GNU/Linux


How reproducible:
every time

Steps to Reproduce:
1. gcc -o test-vfork test-vfork.c
2. ./test-vfork
3.
  
Actual results:
tomh> ./test-vfork
Child process is 5132
Status for PID 5133: stopped with SIGSTOP


Expected results:
tweety> ./test-vfork                
Child process is 17831
Status for PID 17831: stopped with SIGTRAP
Status for PID 17832: stopped with SIGSTOP


Additional info:
On all previous kernels (since the introduction of the PTRACE_O_TRACE*
flags), I've always seen both the parent and child status.

I'll attach the vfork-test.c source file.

Comment 1 Tom Horsley 2008-09-29 14:32:08 UTC
Created attachment 317986 [details]
test-vfork.c source file

Comment 2 Chuck Ebbert 2008-09-30 04:29:37 UTC
2.6.26:
The first child is waiting in fork() in state D+ according to ps.
The second one is in a utrace function in state T+.

On 2.6.25, both children are stopped in state T+ in a utrace function.

Comment 3 Roland McGrath 2008-09-30 04:51:33 UTC
Verified OK in vanilla 2.6.27-rc8 and broken in development utrace branch.

Fixed in utrace branch, building in rawhide kernel-2_6_27-0_372_rc8_fc10

Comment 4 Denys Vlasenko 2008-10-02 11:42:24 UTC
> Expected results:
> tweety> ./test-vfork                
> Child process is 17831
> Status for PID 17831: stopped with SIGTRAP
> Status for PID 17832: stopped with SIGSTOP

I do not understand how this is possible. vfork child runs _before_ parent, how come you see parent first? The correct sequence of events here:

* ptrace(PTRACE_SETOPTIONS, PTRACE_O_TRACEFORK|PTRACE_O_TRACEVFORK...)
* ptrace(PTRACE_CONT...)
* test process vforks
* waitpid(-1) - we see vfork'ed child in SIGSTOP

since we do NOT PTRACE_CONT it, it stays stopped, and thus parent never continues.

If this "worked" in the past, maybe it worked only because strace was cheating and making vfork work as fork. This is bad, since it was changing the behavior of straced program. I was bitten by this a few times.

You have to continue the child. With this fix to your test:

         if (kid == (pid_t)-1) {
            perror("waitpid");
            exit(2);
         }
+if (ptrace(PTRACE_CONT, kid, 0, 0) == -1) {
+ perror("ptrace2");
+}
         printf("Status for PID %d: %s\n", (int)kid, status_string(waitstat));


I see this:

# ./test-vfork
Child process is 23085
Status for PID 23086: stopped with SIGSTOP
Status for PID 23086: stopped with SIGTRAP
ptrace2: No such process
Status for PID 23086: exited normally
Status for PID 23085: stopped with SIGTRAP
Status for PID 23085: stopped with SIGCHLD
Status for PID 23085: stopped with SIGTRAP
ptrace2: No such process
Status for PID 23085: exited normally
waitpid: No child processes

Looks correct to me.

Comments?

Comment 5 Tom Horsley 2008-10-02 12:03:05 UTC
>I do not understand how this is possible. vfork child runs _before_ parent, how
>come you see parent first? The correct sequence of events here:

The ordering of the parent/child events has always been random (despite the
fact that it would be infinitely more convenient for debuggers to find out
about the parent first so as to know to expect the child event and avoid
trying to do any modifications to the parent till they copy all the
parent's state information about breakpoints and such to the child's state).

>If this "worked" in the past, maybe it worked only because strace was cheating
>and making vfork work as fork. This is bad, since it was changing the behavior
>of straced program. I was bitten by this a few times.

Linux has always made vfork act like fork when ptraced. I can't see how it
can avoid doing that. If it doesn't do that, then any modifications
a debugger makes in the debugged child process will also affect the
parent process which can't possibly be what anyone would want to
happen.

But even if it isn't doing that, the SIGTRAP in the parent is merely
telling me "Hey! I vfork()ed". If the parent wants to stay stopped
till the child runs, that's the parent's choice, but I don't see any
obstacle to providing the "Hey! I vfork()ed status to the debugger
before it stops to wait on the child".

(I could make an editorial comment here about how the whole fork/exec
model is the most incredibly stupid design decision in the history
of all computing, but that would be off topic :-).

Comment 6 Denys Vlasenko 2008-10-02 12:18:56 UTC
Created attachment 319218 [details]
Modified test program (PTRACE_CONTs traced processes)

> Linux has always made vfork act like fork when ptraced. I can't see how it
can avoid doing that. If it doesn't do that, then any modifications
a debugger makes in the debugged child process will also affect the
parent process which can't possibly be what anyone would want to
happen.

No. This is exactly what I want to happen - that traced/debugged program operates as close as not-traced one. It's bad when you try to debug something and it disappears when y
ou run under debugger.

And this is exactly what happens on recent kernel+strace. vfork seems to be not converted to fork by strace. I attached modified test program, try it. Even though child sleep(1), parent does not outrun it.

Anyway. On 2.6.27-rc7 + utrace patch:

Child process is 3672
Status for PID 3673: stopped with SIGSTOP
child
Status for PID 3673: stopped with SIGTRAP
Status for PID 3673: exited normally
ptrace2: No such process
Status for PID 3672: stopped with SIGTRAP
Status for PID 3672: stopped with SIGCHLD
parent
Status for PID 3672: stopped with SIGTRAP
Status for PID 3672: stopped with SIGTRAP
Status for PID 3672: exited normally
ptrace2: No such process
waitpid: No child processes

On 2.6.27-0.372.rc8:

Child process is 3264
Status for PID 3264: stopped with SIGTRAP
Status for PID 3265: stopped with SIGSTOP
child
Status for PID 3265: stopped with SIGTRAP
parent
Status for PID 3265: exited normally
ptrace2: No such process
Status for PID 3264: stopped with SIGTRAP
Status for PID 3264: stopped with SIGCHLD
Status for PID 3264: stopped with SIGTRAP
Status for PID 3264: exited normally
ptrace2: No such process
waitpid: No child processes

Ronald seems to be right: it is indeed fixed in 372 - parent gets SIGTRAP.
Ny concerns were not warranted - even though parent gets SIGTRAP, it does NOT continue ahead of vforked child, so everything works as intended.

Comment 7 Tom Horsley 2008-10-02 12:32:23 UTC
>And this is exactly what happens on recent kernel+strace. vfork seems to be not
>converted to fork by strace. I attached modified test program, try it. Even
>though child sleep(1), parent does not outrun it.

In that case, I guess it is vitally important that my debugger recognize
a vfork()ed child and refuse to modify it so as to protect the parent
from unexpected modification as a side effect.

Yet another test I need to add to my near infinite collection of tests
for subtle ptrace() differences so I can debug correctly on many different
kernels :-).

I have indeed seen problems with ptrace converting vfork to fork.
Unbelievably, there are still some versions of ksh out there which
actually rely on the vfork shared memory semantics to communicate
back child status to the parent, and they sometimes fail to report
errors when run under the debugger, so there are good and bad aspects
to this change (I just wonder why when I want a change its always
"not backward compatible", but when the kernel wants a change
its "progress" :-).

Comment 8 Denys Vlasenko 2008-10-02 12:58:00 UTC
s/Ronald/Roland/. Sorry Roland! :|

Tom:
> Yet another test I need to add to my near infinite collection of tests
for subtle ptrace() differences

I think *strace* does vfork -> fork substitution, not *ptrace() syscall*. Aha here is it:

int
change_syscall(tcp, new)
struct tcb *tcp;
int new;
{
#if defined(LINUX)
#if defined(I386)
        /* Attempt to make vfork into fork, which we can follow. */
        if (ptrace(PTRACE_POKEUSER, tcp->pid, (char*)(ORIG_EAX * 4), new) < 0)
                return -1;
...

and here:

                /* ia64 calls directly `clone (CLONE_VFORK | CLONE_VM)'
                   contrary to x86 SYS_vfork above.  Even on x86 we turn the
                   vfork semantics into plain fork - each application must not
                   depend on the vfork specifics according to POSIX.  We would
                   hang waiting for the parent resume otherwise.  We need to
                   clear also CLONE_VM but only in the CLONE_VFORK case as
                   otherwise we would break pthread_create.  */

Comment 9 Tom Horsley 2008-10-02 13:25:23 UTC
>I think *strace* does vfork -> fork substitution, not *ptrace() syscall*.

Strace may indeed do that, but ptrace has always changed the semantics
as well. The memory sharing doesn't happen when I follow a debugged
vfork to the child. I spent a few days trying to figure out why a debugged
ksh on opensuse never reported an error when you run a nonexistant
program before I finally discovered it was compiled to use vfork and
report status from the child by modifying the parent's memory, which
doesn't happen when the child is debugged.

At least it tells me how to write a test for vfork semantics :-). Then
I can see if it is really a problem for the debugger under the new kernel
or not.

Comment 10 Jan Kratochvil 2008-10-02 13:38:54 UTC
(In reply to comment #9)
> Strace may indeed do that, but ptrace has always changed the semantics
> as well. The memory sharing doesn't happen when I follow a debugged
> vfork to the child.

According to POSIX any application is broken if it depends on vfork specifics:
http://www.opengroup.org/onlinepubs/009695399/functions/vfork.html
On some implementations, vfork() is equivalent to fork().

Still I see the memory sharing with gdb on x86_64:

$ gdb ./vforkx
GNU gdb Fedora (6.8-24.fc10)
Copyright (C) 2008 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
(gdb) start
Breakpoint 1 at 0x4005f4: file vforkx.c, line 9.
Starting program: /tmp/vforkx 
main () at vforkx.c:9
9	  volatile int x = 1;
(gdb) set follow-fork-mode child 
(gdb) n
11	  switch (vfork ())
(gdb) 
[Switching to process 24610]
16	      x = 2;
(gdb) 
17	      _exit (0);
(gdb) c
Continuing.

Program exited normally.
(gdb) q
2
$ _

vforkx.c:
#include <unistd.h>
#include <sys/wait.h>
#include <assert.h>
#include <stdio.h>

int
main (void)
{
  volatile int x = 1;

  switch (vfork ())
    {
    case -1:
      assert (0);
    case 0:
      x = 2;
      _exit (0);
    default:
      wait (NULL);
    }
  printf ("%d\n", x);
  return 0;
}

Comment 11 Denys Vlasenko 2008-10-02 14:07:36 UTC
Created attachment 319237 [details]
Patch to ptrace-tests to check for this bug.

I verified that it works on kernel-2.6.27-0.372 and fails on earlier
kernels and on vanilla kernel + utrace patch (one week old one).

Comment 12 Chuck Ebbert 2008-10-04 05:57:18 UTC
(In reply to comment #3)
> Verified OK in vanilla 2.6.27-rc8 and broken in development utrace branch.
> 
> Fixed in utrace branch, building in rawhide kernel-2_6_27-0_372_rc8_fc10

I can has backport to F8/F9?

Comment 13 Tom Horsley 2008-10-27 13:41:52 UTC
Created attachment 321611 [details]
Modified test-vfork program

I notice in the latest fedora 8 kernel update (kernel-2.6.26.6-49.fc8) I am
getting the parent process vfork notification now, but I see a new problem
(which maybe should be a different bug?) I no longer get the child pid
in the event message when I get the "vfork done" event. Here's a sample
run on a fedora 7 box:

tweety> ./test-vfork
Child process is 32709
Status for PID 32709: stopped with SIGTRAP event=vfork msg=0x7fc6 (32710)
Status for PID 32710: stopped with SIGSTOP
Status for PID 32709: stopped with SIGTRAP event=vfork done msg=0x7fc6 (32710)
Status for PID 32710: stopped with SIGTRAP event=exec msg=0x0 (0)
Status for PID 32709: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 32709: exited normally
Mon Oct 27 09:30:13 EDT 2008
Status for PID 32710: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 32710: exited normally
waitpid: No child processes
tweety> uname -a
Linux tweety 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:35:10 EDT 2008 i686 i686 i386 GNU/Linux

Note that both the vfork event and vfork done event show the child pid
in the eventmsg results.

On fedora 8 box I see this:

tomh> ./test-vfork
Child process is 4709
Status for PID 4710: stopped with SIGSTOP
Status for PID 4709: stopped with SIGTRAP event=vfork msg=0x1266 (4710)
Status for PID 4710: stopped with SIGTRAP event=exec msg=0x0 (0)
Status for PID 4709: stopped with SIGTRAP event=vfork done msg=0x0 (0)
Mon Oct 27 09:38:54 EDT 2008
Status for PID 4710: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 4709: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 4710: exited normally
Status for PID 4709: exited normally
waitpid: No child processes
tomh> uname -a
Linux tomh 2.6.26.6-49.fc8 #1 SMP Fri Oct 17 15:59:36 EDT 2008 i686 i686 i386 GNU/Linux

Note that the vfork done eventmsg is zero.

I supposed I can do the bookkeeping myself (since the parent is obviously
only gonna be running one vfork at a time), but it was convenient to rely
on the kernel to tell me which child the event was associated with, and it
always seemed to be there in other kernels I tested.

Comment 14 Bug Zapper 2008-11-26 11:11:39 UTC
This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 15 Bug Zapper 2009-01-09 06:47:01 UTC
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.

Comment 16 Tom Horsley 2009-03-26 18:34:48 UTC
Actually this is even more severely busted in Fedora 10. The parent notification
once again no longer shows up on a vfork. Here's a test run of the test prog
from comment #13 above on an x86_64 fedora 10 box running kernel
2.6.27.19-170.2.35.fc10.x86_64:

tomh> ./test-vfork
Child process is 2887
Status for PID 2888: stopped with SIGSTOP
Status for PID 2888: stopped with SIGTRAP event=exec msg=0x0 (0)
Thu Mar 26 14:30:28 EDT 2009
Status for PID 2888: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 2888: exited normally

It never completes, the 2887 process (which did the vfork) never gets
any additional status back from ptrace and never goes anywhere.

On a centos 5.1 system with kernel: 2.6.18-53.1.13.el5 (x86_64 as well)
the test run looks like:

[nvtest@centos51-x b]$ ./test-vfork
Child process is 3373
Status for PID 3373: stopped with SIGTRAP event=vfork msg=0xd2e (3374)
Status for PID 3374: stopped with SIGSTOP
Status for PID 3373: stopped with SIGTRAP event=vfork done msg=0xd2e (3374)
Status for PID 3373: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 3373: exited normally
Status for PID 3374: stopped with SIGTRAP event=exec msg=0x0 (0)
Thu Mar 26 14:27:16 EDT 2009
Status for PID 3374: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 3374: exited normally
waitpid: No child processes

Note that lots of status comes in for the vforking process 3373, everybody
runs and exits normally.

P.S. The only status it gave me to change CLOSED to was ASSIGNED, so that's
the status I changed this bugzilla to.

Comment 17 Denys Vlasenko 2009-03-26 18:56:15 UTC
I think Jan Kratochvil recently created a testcase for it, "o_tracevfork-parent".

He observed the failure on kernel 2.6.29-0.218.rc7.git2.fc11.x86_64).

I believe he started working on it from bug
https://bugzilla.redhat.com/show_bug.cgi?id=486564 ("utrace: gdb on gdb hangs")

The testcase can be downloaded from ptrace-tests here:

http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/o_tracevfork-parent.c?cvsroot=systemtap

Comment 18 Tom Horsley 2009-03-26 19:06:08 UTC
All I can say it that it would sure be nice to get the fix backported
into fedora 10 if it gets fixed in the 2.6.29 kernel :-).

Comment 19 Tom Horsley 2009-03-27 00:53:04 UTC
No doubt the testcase above already knows this, but experimenting with
the test program from this bug seems to indicate that the
PTRACE_O_TRACEVFORKDONE flag is the troublemaker. If I leave that
flag off in the setoptions call, the vforking parent does not
get hung.

Comment 20 Chuck Ebbert 2009-03-28 14:10:08 UTC
Can you try 2.6.29-10 from koji:
http://koji.fedoraproject.org/koji/buildinfo?buildID=95672

Comment 21 Tom Horsley 2009-03-28 15:26:42 UTC
Looks like the 2.6.29-10 kernel does indeed work correctly. The test program
from comment #13 hangs forever on previous kernel, but the results are
correct with no hang on the 2.6.29-10 kernel:

zooty> ./test-vfork
Child process is 4411
Status for PID 4412: stopped with SIGSTOP
Status for PID 4411: stopped with SIGTRAP event=vfork msg=0x113c (4412)
Status for PID 4412: stopped with SIGTRAP event=exec msg=0x0 (0)
Status for PID 4411: stopped with SIGTRAP event=vfork done msg=0x113c (4412)
Sat Mar 28 11:19:59 EDT 2009
Status for PID 4412: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 4411: stopped with SIGTRAP event=exit msg=0x0 (0)
Status for PID 4412: exited normally
Status for PID 4411: exited normally
waitpid: No child processes

With old kernel, I would never have gotten any status for pid 4411 in
this example. And I see the event message correctly reports the child
PID in all cases as well (which was one of the other bugs I saw above).

Comment 22 Bug Zapper 2009-11-18 08:26:35 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 23 Bug Zapper 2009-12-18 06:27:12 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.