Description of problem: With latest 2.6.26 kernel on fedora 8, using ptrace() to follow fork/vfork/clone I no longer see the parent PID stop when it calls vfork(). On previous kernels I would get the child process stopped with SIGSTOP and the parent process stopped with SIGTRAP (in some random order). With new kernel, no amount of waiting ever results in the parent SIGTRAP notification for a vfork(). Version-Release number of selected component (if applicable): Linux tomh 2.6.26.3-14.fc8 #1 SMP Wed Sep 3 03:40:05 EDT 2008 i686 i686 i386 GNU/Linux How reproducible: every time Steps to Reproduce: 1. gcc -o test-vfork test-vfork.c 2. ./test-vfork 3. Actual results: tomh> ./test-vfork Child process is 5132 Status for PID 5133: stopped with SIGSTOP Expected results: tweety> ./test-vfork Child process is 17831 Status for PID 17831: stopped with SIGTRAP Status for PID 17832: stopped with SIGSTOP Additional info: On all previous kernels (since the introduction of the PTRACE_O_TRACE* flags), I've always seen both the parent and child status. I'll attach the vfork-test.c source file.
Created attachment 317986 [details] test-vfork.c source file
2.6.26: The first child is waiting in fork() in state D+ according to ps. The second one is in a utrace function in state T+. On 2.6.25, both children are stopped in state T+ in a utrace function.
Verified OK in vanilla 2.6.27-rc8 and broken in development utrace branch. Fixed in utrace branch, building in rawhide kernel-2_6_27-0_372_rc8_fc10
> Expected results: > tweety> ./test-vfork > Child process is 17831 > Status for PID 17831: stopped with SIGTRAP > Status for PID 17832: stopped with SIGSTOP I do not understand how this is possible. vfork child runs _before_ parent, how come you see parent first? The correct sequence of events here: * ptrace(PTRACE_SETOPTIONS, PTRACE_O_TRACEFORK|PTRACE_O_TRACEVFORK...) * ptrace(PTRACE_CONT...) * test process vforks * waitpid(-1) - we see vfork'ed child in SIGSTOP since we do NOT PTRACE_CONT it, it stays stopped, and thus parent never continues. If this "worked" in the past, maybe it worked only because strace was cheating and making vfork work as fork. This is bad, since it was changing the behavior of straced program. I was bitten by this a few times. You have to continue the child. With this fix to your test: if (kid == (pid_t)-1) { perror("waitpid"); exit(2); } +if (ptrace(PTRACE_CONT, kid, 0, 0) == -1) { + perror("ptrace2"); +} printf("Status for PID %d: %s\n", (int)kid, status_string(waitstat)); I see this: # ./test-vfork Child process is 23085 Status for PID 23086: stopped with SIGSTOP Status for PID 23086: stopped with SIGTRAP ptrace2: No such process Status for PID 23086: exited normally Status for PID 23085: stopped with SIGTRAP Status for PID 23085: stopped with SIGCHLD Status for PID 23085: stopped with SIGTRAP ptrace2: No such process Status for PID 23085: exited normally waitpid: No child processes Looks correct to me. Comments?
>I do not understand how this is possible. vfork child runs _before_ parent, how >come you see parent first? The correct sequence of events here: The ordering of the parent/child events has always been random (despite the fact that it would be infinitely more convenient for debuggers to find out about the parent first so as to know to expect the child event and avoid trying to do any modifications to the parent till they copy all the parent's state information about breakpoints and such to the child's state). >If this "worked" in the past, maybe it worked only because strace was cheating >and making vfork work as fork. This is bad, since it was changing the behavior >of straced program. I was bitten by this a few times. Linux has always made vfork act like fork when ptraced. I can't see how it can avoid doing that. If it doesn't do that, then any modifications a debugger makes in the debugged child process will also affect the parent process which can't possibly be what anyone would want to happen. But even if it isn't doing that, the SIGTRAP in the parent is merely telling me "Hey! I vfork()ed". If the parent wants to stay stopped till the child runs, that's the parent's choice, but I don't see any obstacle to providing the "Hey! I vfork()ed status to the debugger before it stops to wait on the child". (I could make an editorial comment here about how the whole fork/exec model is the most incredibly stupid design decision in the history of all computing, but that would be off topic :-).
Created attachment 319218 [details] Modified test program (PTRACE_CONTs traced processes) > Linux has always made vfork act like fork when ptraced. I can't see how it can avoid doing that. If it doesn't do that, then any modifications a debugger makes in the debugged child process will also affect the parent process which can't possibly be what anyone would want to happen. No. This is exactly what I want to happen - that traced/debugged program operates as close as not-traced one. It's bad when you try to debug something and it disappears when y ou run under debugger. And this is exactly what happens on recent kernel+strace. vfork seems to be not converted to fork by strace. I attached modified test program, try it. Even though child sleep(1), parent does not outrun it. Anyway. On 2.6.27-rc7 + utrace patch: Child process is 3672 Status for PID 3673: stopped with SIGSTOP child Status for PID 3673: stopped with SIGTRAP Status for PID 3673: exited normally ptrace2: No such process Status for PID 3672: stopped with SIGTRAP Status for PID 3672: stopped with SIGCHLD parent Status for PID 3672: stopped with SIGTRAP Status for PID 3672: stopped with SIGTRAP Status for PID 3672: exited normally ptrace2: No such process waitpid: No child processes On 2.6.27-0.372.rc8: Child process is 3264 Status for PID 3264: stopped with SIGTRAP Status for PID 3265: stopped with SIGSTOP child Status for PID 3265: stopped with SIGTRAP parent Status for PID 3265: exited normally ptrace2: No such process Status for PID 3264: stopped with SIGTRAP Status for PID 3264: stopped with SIGCHLD Status for PID 3264: stopped with SIGTRAP Status for PID 3264: exited normally ptrace2: No such process waitpid: No child processes Ronald seems to be right: it is indeed fixed in 372 - parent gets SIGTRAP. Ny concerns were not warranted - even though parent gets SIGTRAP, it does NOT continue ahead of vforked child, so everything works as intended.
>And this is exactly what happens on recent kernel+strace. vfork seems to be not >converted to fork by strace. I attached modified test program, try it. Even >though child sleep(1), parent does not outrun it. In that case, I guess it is vitally important that my debugger recognize a vfork()ed child and refuse to modify it so as to protect the parent from unexpected modification as a side effect. Yet another test I need to add to my near infinite collection of tests for subtle ptrace() differences so I can debug correctly on many different kernels :-). I have indeed seen problems with ptrace converting vfork to fork. Unbelievably, there are still some versions of ksh out there which actually rely on the vfork shared memory semantics to communicate back child status to the parent, and they sometimes fail to report errors when run under the debugger, so there are good and bad aspects to this change (I just wonder why when I want a change its always "not backward compatible", but when the kernel wants a change its "progress" :-).
s/Ronald/Roland/. Sorry Roland! :| Tom: > Yet another test I need to add to my near infinite collection of tests for subtle ptrace() differences I think *strace* does vfork -> fork substitution, not *ptrace() syscall*. Aha here is it: int change_syscall(tcp, new) struct tcb *tcp; int new; { #if defined(LINUX) #if defined(I386) /* Attempt to make vfork into fork, which we can follow. */ if (ptrace(PTRACE_POKEUSER, tcp->pid, (char*)(ORIG_EAX * 4), new) < 0) return -1; ... and here: /* ia64 calls directly `clone (CLONE_VFORK | CLONE_VM)' contrary to x86 SYS_vfork above. Even on x86 we turn the vfork semantics into plain fork - each application must not depend on the vfork specifics according to POSIX. We would hang waiting for the parent resume otherwise. We need to clear also CLONE_VM but only in the CLONE_VFORK case as otherwise we would break pthread_create. */
>I think *strace* does vfork -> fork substitution, not *ptrace() syscall*. Strace may indeed do that, but ptrace has always changed the semantics as well. The memory sharing doesn't happen when I follow a debugged vfork to the child. I spent a few days trying to figure out why a debugged ksh on opensuse never reported an error when you run a nonexistant program before I finally discovered it was compiled to use vfork and report status from the child by modifying the parent's memory, which doesn't happen when the child is debugged. At least it tells me how to write a test for vfork semantics :-). Then I can see if it is really a problem for the debugger under the new kernel or not.
(In reply to comment #9) > Strace may indeed do that, but ptrace has always changed the semantics > as well. The memory sharing doesn't happen when I follow a debugged > vfork to the child. According to POSIX any application is broken if it depends on vfork specifics: http://www.opengroup.org/onlinepubs/009695399/functions/vfork.html On some implementations, vfork() is equivalent to fork(). Still I see the memory sharing with gdb on x86_64: $ gdb ./vforkx GNU gdb Fedora (6.8-24.fc10) Copyright (C) 2008 Free Software Foundation, Inc. License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. Type "show copying" and "show warranty" for details. This GDB was configured as "x86_64-redhat-linux-gnu"... (gdb) start Breakpoint 1 at 0x4005f4: file vforkx.c, line 9. Starting program: /tmp/vforkx main () at vforkx.c:9 9 volatile int x = 1; (gdb) set follow-fork-mode child (gdb) n 11 switch (vfork ()) (gdb) [Switching to process 24610] 16 x = 2; (gdb) 17 _exit (0); (gdb) c Continuing. Program exited normally. (gdb) q 2 $ _ vforkx.c: #include <unistd.h> #include <sys/wait.h> #include <assert.h> #include <stdio.h> int main (void) { volatile int x = 1; switch (vfork ()) { case -1: assert (0); case 0: x = 2; _exit (0); default: wait (NULL); } printf ("%d\n", x); return 0; }
Created attachment 319237 [details] Patch to ptrace-tests to check for this bug. I verified that it works on kernel-2.6.27-0.372 and fails on earlier kernels and on vanilla kernel + utrace patch (one week old one).
(In reply to comment #3) > Verified OK in vanilla 2.6.27-rc8 and broken in development utrace branch. > > Fixed in utrace branch, building in rawhide kernel-2_6_27-0_372_rc8_fc10 I can has backport to F8/F9?
Created attachment 321611 [details] Modified test-vfork program I notice in the latest fedora 8 kernel update (kernel-2.6.26.6-49.fc8) I am getting the parent process vfork notification now, but I see a new problem (which maybe should be a different bug?) I no longer get the child pid in the event message when I get the "vfork done" event. Here's a sample run on a fedora 7 box: tweety> ./test-vfork Child process is 32709 Status for PID 32709: stopped with SIGTRAP event=vfork msg=0x7fc6 (32710) Status for PID 32710: stopped with SIGSTOP Status for PID 32709: stopped with SIGTRAP event=vfork done msg=0x7fc6 (32710) Status for PID 32710: stopped with SIGTRAP event=exec msg=0x0 (0) Status for PID 32709: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 32709: exited normally Mon Oct 27 09:30:13 EDT 2008 Status for PID 32710: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 32710: exited normally waitpid: No child processes tweety> uname -a Linux tweety 2.6.23.17-88.fc7 #1 SMP Thu May 15 00:35:10 EDT 2008 i686 i686 i386 GNU/Linux Note that both the vfork event and vfork done event show the child pid in the eventmsg results. On fedora 8 box I see this: tomh> ./test-vfork Child process is 4709 Status for PID 4710: stopped with SIGSTOP Status for PID 4709: stopped with SIGTRAP event=vfork msg=0x1266 (4710) Status for PID 4710: stopped with SIGTRAP event=exec msg=0x0 (0) Status for PID 4709: stopped with SIGTRAP event=vfork done msg=0x0 (0) Mon Oct 27 09:38:54 EDT 2008 Status for PID 4710: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 4709: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 4710: exited normally Status for PID 4709: exited normally waitpid: No child processes tomh> uname -a Linux tomh 2.6.26.6-49.fc8 #1 SMP Fri Oct 17 15:59:36 EDT 2008 i686 i686 i386 GNU/Linux Note that the vfork done eventmsg is zero. I supposed I can do the bookkeeping myself (since the parent is obviously only gonna be running one vfork at a time), but it was convenient to rely on the kernel to tell me which child the event was associated with, and it always seemed to be there in other kernels I tested.
This message is a reminder that Fedora 8 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 8. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '8'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 8's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 8 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.
Actually this is even more severely busted in Fedora 10. The parent notification once again no longer shows up on a vfork. Here's a test run of the test prog from comment #13 above on an x86_64 fedora 10 box running kernel 2.6.27.19-170.2.35.fc10.x86_64: tomh> ./test-vfork Child process is 2887 Status for PID 2888: stopped with SIGSTOP Status for PID 2888: stopped with SIGTRAP event=exec msg=0x0 (0) Thu Mar 26 14:30:28 EDT 2009 Status for PID 2888: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 2888: exited normally It never completes, the 2887 process (which did the vfork) never gets any additional status back from ptrace and never goes anywhere. On a centos 5.1 system with kernel: 2.6.18-53.1.13.el5 (x86_64 as well) the test run looks like: [nvtest@centos51-x b]$ ./test-vfork Child process is 3373 Status for PID 3373: stopped with SIGTRAP event=vfork msg=0xd2e (3374) Status for PID 3374: stopped with SIGSTOP Status for PID 3373: stopped with SIGTRAP event=vfork done msg=0xd2e (3374) Status for PID 3373: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 3373: exited normally Status for PID 3374: stopped with SIGTRAP event=exec msg=0x0 (0) Thu Mar 26 14:27:16 EDT 2009 Status for PID 3374: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 3374: exited normally waitpid: No child processes Note that lots of status comes in for the vforking process 3373, everybody runs and exits normally. P.S. The only status it gave me to change CLOSED to was ASSIGNED, so that's the status I changed this bugzilla to.
I think Jan Kratochvil recently created a testcase for it, "o_tracevfork-parent". He observed the failure on kernel 2.6.29-0.218.rc7.git2.fc11.x86_64). I believe he started working on it from bug https://bugzilla.redhat.com/show_bug.cgi?id=486564 ("utrace: gdb on gdb hangs") The testcase can be downloaded from ptrace-tests here: http://sources.redhat.com/cgi-bin/cvsweb.cgi/~checkout~/tests/ptrace-tests/tests/o_tracevfork-parent.c?cvsroot=systemtap
All I can say it that it would sure be nice to get the fix backported into fedora 10 if it gets fixed in the 2.6.29 kernel :-).
No doubt the testcase above already knows this, but experimenting with the test program from this bug seems to indicate that the PTRACE_O_TRACEVFORKDONE flag is the troublemaker. If I leave that flag off in the setoptions call, the vforking parent does not get hung.
Can you try 2.6.29-10 from koji: http://koji.fedoraproject.org/koji/buildinfo?buildID=95672
Looks like the 2.6.29-10 kernel does indeed work correctly. The test program from comment #13 hangs forever on previous kernel, but the results are correct with no hang on the 2.6.29-10 kernel: zooty> ./test-vfork Child process is 4411 Status for PID 4412: stopped with SIGSTOP Status for PID 4411: stopped with SIGTRAP event=vfork msg=0x113c (4412) Status for PID 4412: stopped with SIGTRAP event=exec msg=0x0 (0) Status for PID 4411: stopped with SIGTRAP event=vfork done msg=0x113c (4412) Sat Mar 28 11:19:59 EDT 2009 Status for PID 4412: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 4411: stopped with SIGTRAP event=exit msg=0x0 (0) Status for PID 4412: exited normally Status for PID 4411: exited normally waitpid: No child processes With old kernel, I would never have gotten any status for pid 4411 in this example. And I see the event message correctly reports the child PID in all cases as well (which was one of the other bugs I saw above).
This message is a reminder that Fedora 10 is nearing its end of life. Approximately 30 (thirty) days from now Fedora will stop maintaining and issuing updates for Fedora 10. It is Fedora's policy to close all bug reports from releases that are no longer maintained. At that time this bug will be closed as WONTFIX if it remains open with a Fedora 'version' of '10'. Package Maintainer: If you wish for this bug to remain open because you plan to fix it in a currently maintained version, simply change the 'version' to a later Fedora version prior to Fedora 10's end of life. Bug Reporter: Thank you for reporting this issue and we are sorry that we may not be able to fix it before Fedora 10 is end of life. If you would still like to see this bug fixed and are able to reproduce it against a later version of Fedora please change the 'version' of this bug to the applicable version. If you are unable to change the version, please add a comment here and someone will do it for you. Although we aim to fix as many bugs as possible during every release's lifetime, sometimes those efforts are overtaken by events. Often a more recent Fedora release includes newer upstream software that fixes bugs or makes them obsolete. The process we are following is described here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is no longer maintained, which means that it will not receive any further security or bug fix updates. As a result we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen this bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.