When the parent process sends a PTRACE_KILL to the child that has been stopped by SIGTRAP (initiated by PTRACE_TRACEME), the child is not killed, rather than starts running freely. This kernel bug is present on FC7, RH 5, 5u1, and 5u2, using x86, x86-64, or Power processors. On the other hand this problem is not present on e.g. SUSE 10.1, 10.2, and RH 4u5. This implies to us that the working systems have kernel equal to or less than 2.6.16. the failing systems have kernel equal to or newer than 2.6.18. This problem reproduces with e.g. both gcc and PGI compilers. The reproducer here uses gcc 4.3.0. The reproducer package consists of two program's: the 'user' code simplestat_g.out, and the master 'Debugger' code (test_TV.c). First the master code forks a child and the child then sets PTRACE_TRACEME. The child then starts running exec(./simplestat_g.out). The master waits at wait(), and immediately sends PTRACE_KILL to the child. As result, the child should exit, and never actually execute and let loose simplestat_g.out. We suspect that this might be a race condition in the kernel, possibly a race condition between setting a SIGKILL signal against the child process and letting it run so it gets killed. This kernel problem prevents TotalView Debugger from debugging any '-static' compiled programs on these platforms. We consider this bug as a critical bug in the kernel and hope that it would be fixed as a very high priority. For more details, please see the reproducer codes, particularly test_TV.c. XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Reproduce: # User's prog, w/ (or w/o) -static, e.g. here statically linked /home/compilers/gnu/gcc/4.3.0/x86_64-linux/bin/gcc -g -static -o simplestat_g.out simple.c -lm # Mini Debugger prog, executing simplestat_g and trying to PTRACE_KILL it /home/compilers/gnu/gcc/4.3.0/x86_64-linux/bin/gcc -o a.out test_TV.c ./a.out XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX Sample output: FAILING execution, RH 5u1, x86-64: rhel51-x8664:/home/seppo/Bugs/Bug_11153 > ./a.out CHILD: PTRACE_TRACEME at 0 :: return code 0 PARENT: WAIT status 1407 from PID 28777 PARENT: status -> CHILD stopped, by signal 5 PARENT: Sent PTRACE_KILL to 28777 :: return code 0 FINISHED rhel51-x8664:/home/seppo/Bugs/Bug_11153 > counter 0 counter 1 counter 2 counter 3 counter 4 counter 5 counter 6 counter 7 counter 8 counter 9 rhel51-x8664:/home/seppo/Bugs/Bug_11153 > rhel51-x8664:/home/seppo/Bugs/Bug_11153 > uname -a Linux rhel51-x8664.totalviewtech.com 2.6.18-53.el5 #1 SMP Wed Oct 10 16:34:19 EDT 2007 x86_64 x86_64 x86_64 GNU/Linux rhel51-x8664:/home/seppo/Bugs/Bug_11153 > SUCCESFULL execution, SUSE 10 SP1, x86-64: gari:/home/seppo/Bugs/Bug_11153 > ./a.out CHILD: PTRACE_TRACEME at 0 :: return code 0 PARENT: WAIT status 1407 from PID 29369 PARENT: status -> CHILD stopped, by signal 5 PARENT: Sent PTRACE_KILL to 29369 :: return code 0 FINISHED gari:/home/seppo/Bugs/Bug_11153 > uname -a Linux gari 2.6.16.13-4-smp #1 SMP Wed May 3 04:53:23 UTC 2006 x86_64 x86_64 x86_64 GNU/Linux gari:/home/seppo/Bugs/Bug_11153 > XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX /********************************* /* simple.c **********************************/ #include <stdio.h> #include <math.h> #define jmax 10 #define imax 10 /* int printf(const char *format, ...);*/ /*** some global vars **/ double betamax[imax]; typedef struct { unsigned char d1:4; unsigned char d2:4; } X3; struct { int myint; char mychar; } mystruct; main(argc, argv) int argc; char **argv; { int i, j, p = 10; X3 x3; /************* command line args ***/ { char command_line_string[80]; if (argc > 1) { x3.d1 = 0xa; strcpy(command_line_string, argv[1]); printf("arg_2 = %s\n", command_line_string); } } /**** some array operations ***/ { int i, j, p = 13; double xi, xj, dx, scale = 100.0; mystruct.myint = 42; mystruct.mychar = 'a'; for (j = 0; j < jmax; j++) { int jmod, p; jmod = (100*j)%jmax; p = jmod + 10; xj = (double)jmod/(double)jmax; for (i = 0; i < imax; i++) { int p = 42; xi = (double)i/(double)imax * 2; dx = xi-xj; betamax[i] = 2.0/(1.0+exp(scale*dx*dx)); } printf("counter %d\n", j); sleep(1); } } exit(1); } XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX test_TV.c #include <stdio.h> #include <errno.h> #include <sys/ptrace.h> #include <sys/types.h> #include <sys/wait.h> #include <unistd.h> #include <linux/user.h> #include <math.h> int main() { pid_t child; long orig_eax; int status; child = fork(); if(child == 0) { orig_eax = ptrace(PTRACE_TRACEME, NULL, NULL, NULL); printf("CHILD: PTRACE_TRACEME at %d :: return code %d \n", child, orig_eax ); execl("./simplestat_g.out", "simplestat", NULL); } else { orig_eax = wait(&status); printf("PARENT: WAIT status %d from PID %d \n", status, orig_eax ); if(WIFEXITED(status)) printf("PARENT: status -> CHILD exited normally, exit status %d \n", WEXITSTATUS(status) ); if(WIFSIGNALED(status)) printf("PARENT: status -> CHILD terminated by signal, signal %d \n", WTERMSIG(status) ); if(WIFSTOPPED(status)) printf("PARENT: status -> CHILD stopped, by signal %d \n", WSTOPSIG(status) ); if(WIFCONTINUED(status)) printf("PARENT: status -> CHILD continued \n" ); orig_eax = ptrace(PTRACE_KILL, child, NULL, NULL); printf("PARENT: Sent PTRACE_KILL to %d :: return code %d \n", child, orig_eax ); } printf("\n FINISHED \n" ); return 0; } XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
Seppo, I was talking to one of our kernel engineers and they are working on fixing this bug. However, there is a way that should work reliably and solve your problem. Instead of doing: orig_eax = ptrace(PTRACE_KILL, child, NULL, NULL); printf("PARENT: Sent PTRACE_KILL to %d :: return code %d \n", child, orig_eax ); Do: orig_eax = kill( child, SIGKILL); printf("PARENT: Sent SIGKILL to %d :: return code %d \n", child, orig_eax ); orig_eax = ptrace(PTRACE_KILL, child, NULL, NULL); printf("PARENT: Sent PTRACE_KILL to %d :: return code %d \n", child, orig_eax ); and don't worry if the ptrace(PTRACE_KILL) return's -1 with errno set to ESRCH. That is just fine. You can probably just ignore the PTRACE_KILL step it really isn't needed but it provides a paranoid assurance. Evidently, PTRACE_KILL was there for some ancient kernels which might not handle the signals and ptrace well but the kernel engineer who is working on the problem believes that no modern UNIX should have problems with it. -ben
Hi Ben, Thanks for the WA, and sorry about the delay in response. We discussed your suggested WA, and in summary the kill mechanism in TV uses various KILL methods for different systems and situations, and implementing SIGKILL instead of PTRACE_KILL would be quite a large engineering effort, involving changes and checking in several places of the code, and a lot of careful (regression) testing for various platforms. So unfortunately implementing SIGKILL for this particular scenario would not be trivial in TV. With that, and given that PTRACE_KILL is a supported feature of ptrace interface, it would be very important to have this fixed in RH 5 as a high priority. On that note, I am happy to hear that this bug is being worked on by RH engineers. Thanks, Seppo
Need an update. This event sent from IssueTracker by woodard issue 192971
Still poking around. Unfortunately, have no significant achievements. Some consequences still not understandable for me. What I have at the moment: - it's almost impossible to reproduce on the fast cpus, e.g. quad-core tylersburg; - and it's easy to trigger in virtual environment; - behaviour depends from the environment, e.g. it's not reproducible in midnight commander and easily reproducible in bash. The fun is that in KVM behaviour is inside out; - the flags, that task struct has at the moment of PTRACE_KILL is the same for both cases(when the child dies as should, and when it continues to run); - I have no clue, why the signal(PTRACE_KILL) do not gets its destination, and can't formulate it because I do not see differences in the code-flow; I believe, it's a great race condition, but I have no opinion where to look for it so far, since I'm too newb with signals. And keep learn it at the moment. The question - I do not have the answer and will be happy to know: _why_ the behaviour could be different in mc and bash? this is one of the points above. /me keeps learn the signals in general.
s/tylersburg/bloomfield/
Hi Anton, Thank you for the interesting observations, I will test them out and keep you posted. Let me also pitch in some general observations, if they would be helpful in any way: -- We were testing this on a virtual machine, using tcsh. And the bug reproduces 100%. -- When one debugs on application with TotalView, this bug is present only, if the application is statically linked. (Luckily), when debugging dynamically linked applications, the bug is not present. -- The bug reproduces on x86, x86-64, and Power architectures. Thanks, Seppo
Seppo, for your information, regarding the very first comment of this bugzilla, about SuSE and different kernel versions. In upstream kernel and SuSE - _ptrace_ is used, but in Fedora and RHEL - _utrace_. _ptrace_ has not this issue. utrace implementation will be in upstream in the future, afaik several hanks is already there. :)
I'm investigating the problem. Meanwhile, I can suggest you a workaround. You can hijack the ptrace syscall, and send a regular SIGKILL signal to the ptracee when the request is PTRACE_KILL.
Created attachment 317591 [details] Hijack the ptrace syscall Compile and link: $ gcc -c ptrace_hijack.c $ ld -shared -o ptrace_hijack.so ptrace_hijack.o -ldl Run your program with LD_PRELOAD=/patch/to/ptrace_hijack.so. Example: $LD_PRELOAD=./ptrace_hijack.so ./a.out
Since this it taking so long to get fixed. The engineers at TotalView would like to tweak their code to work around this problem. To be able to do this, they need either: 1) an assurance that there are no older versions of Linux which need you to use PTRACE_KILL rather than plain kill(2) to avoid a race. I'm not sure that this is really possible without exhaustively testing all kernels which I do not think is practical. 2) a programmatic way to see if the underlying implementation of ptrace is utrace. This event sent from IssueTracker by woodard issue 192971
Created attachment 319135 [details] Patch to upstream kernel to just send SIGKILL in PTRACE_KILL Patch is so trivial it probably applies to wide range of kernels. In case it doesn't apply, see the next attachment for utrace-based kernels (typical Red Hat kernels are utrace-based) Can someone who can reproduce this bug test the fix?
Created attachment 319136 [details] Patch to utrace-based kernels to just send SIGKILL on PTRACE_KILL
That last patch the one that is supposed to work on utrace-based kernels doesn't apply to our target disto 5.2 Can you rework it so that it applies. The function that it applies to doesn't even exist in 5.2.
Created attachment 319204 [details] Patch against kernel-2.6.18-98.el5 Ben, please try this one.
I think I understand why it does not work. The following explanation applies to RHEL5 only. When calling ptrace(PTRACE_KILL,...), sys_ptrace() calls ptrace_common(), which calls ptrace_induce_signal(), which looks like this: static int ptrace_induce_signal(struct task_struct *target, struct utrace_attached_engine *engine, long signr) { struct ptrace_state *state = engine->data; if (signr == 0) return 0; if (!valid_signal(signr)) return -EIO; if (state->syscall) { /* * This is the traditional ptrace behavior when given * a signal to resume from a syscall tracing stop. */ send_sig(signr, target, 1); } else if (!state->have_eventmsg && state->u.siginfo) { siginfo_t *info = state->u.siginfo; /* Update the siginfo structure if the signal has changed. If the debugger wanted something specific in the siginfo structure then it should have updated *info via PTRACE_SETSIGINFO. */ if (signr != info->si_signo) { info->si_signo = signr; info->si_errno = 0; info->si_code = SI_USER; info->si_pid = current->pid; info->si_uid = current->uid; } return utrace_inject_signal(target, engine, UTRACE_ACTION_RESUME, info, NULL); } return 0; } It does not send SIGKILL to the tracee because: - state->syscall == 0 (the tracee is stopped because of an execve report, not a syscall report) - state->u.siginfo == NULL The only place where that field is set to a non NULL value is in ptrace_report_signal(), which is called, through a bunch of other function, by do_signal(). The value placed in state->u.siginfo is the local variable "info" from do_signal(). That means it's only valid when the traced process is stopped by a *real* signal, not the pseudo SIGTRAP sent by ptrace_report_exec(). That's also probably means that PTRACE_KILL will not kill a process not stopped by a signal or a syscall report (that is a process running or stop by an other event). Likewise, no other signal can send to a process stopped by an execve event with ptrace syscall (for instance by ptrace(PTRACE_CONT, pid, NULL, signr) ). I haven't test it yet, but the ptrace signal system seems very buggy. I have now two solutions in mind: 1) As the syscall field of ptrace_state seems to be used only for choosing the way signal are send in ptrace_induce_signal(), we may set that field for execve report too (and certainly other kind of report). (Roland, do you remember why we have made the distinction between syscall reports and other report in the first place?) 2) If there is a signal to be send when the traced process is not stopped by a signal report, allocate a siginfo_t structure and put it in u.siginfo field. Roland, what do you think about that? Thanks, Jérôme
> > Hi Ben, > > did my patch in comment 24 work for you? > > https://bugzilla.redhat.com/show_bug.cgi?id=455060#c24 > > -- > vda > > I'll look at it again. With the subsequent comment in comment 25 I got the impression that we had just made a big breakthrough and that your patch was likely to be superseded by something that got down and fixed the problem at the root.
Patch in comment 24 is basically an one-liner and falls into "Obviously Correct (tm)" category. It also allows us to drop PTRACE_KILL handling in nineteen architectures (20th match is in kernel/ptrace.c): # grep -r 'case *PTRACE_KILL *:' linux-2.6 | wc -l 20 because we can rely on SIGKILL always working, even on ptraced/stopped/single-stepped/etc processes, right? Suggested fix in comment 25 happens somewhere deep in [pu]trace internals, will be more invasive, and does not allow us to drop arch specific PTRACE_KILL handling. In your opinion, which fix is likely to be less maintenance-intensive?
Created attachment 320242 [details] possible fix vs 2.6.18-119.el5'ish On the workaround issue: there is no past Linux kernel version on which PTRACE_KILL worked 100% reliably in all cases (there have always been certain races, unrelated to the current RHEL5 issue). In nearly all Linux kernel versions, and certainly all RHEL kernels of all versions, plain SIGKILL (sent by kill et al) is always reliable. In every Linux kernel ever to exist, it should be entirely safe and reliable to send a plain SIGKILL (i.e. kill, etc.) followed by calling ptrace(PTRACE_KILL, pid)--ignore an ESRCH error from that ptrace call, it is harmless (and usually indicates that the plain SIGKILL already did the job). On the hack patch: I do not think it would be wise to change ptrace to send a plain SIGKILL in RHEL5. The exact peculiarities of PTRACE_KILL as it is might produce some quirk that another application depends on in a way we are not contemplating at the moment. On the real problem: comment#25 might have the key. If the tracer is using PTRACE_O_TRACEEXEC, then in vanilla kernels there is a ptrace_notify() stop. This kind ignores the ->exit_code value, which means PTRACE_KILL, PTRACE_CONT with nonzero signal, etc. all eat that signal. However, without PTRACE_O_TRACEEXEC, the vanilla kernel just sends a SIGTRAP normally. That means that when SIGTRAP is dequeued, it's in the real signals path where you can properly inject a signal via PTRACE_KILL et al. In the RHEL5 utrace code (kernel/ptrace.c:ptrace_report_exec), it always does a stop in the ptrace_notify() style (though that function no longer exists). This is the bug. By instead posting a SIGTRAP normally here (when PTRACE_O_TRACEEXEC is not set), it will match the vanilla kernel's behavior. The attached patch should fix this, but I have not tried it.
That last patch allows to kill the process stopped in execve (if PTRACE_O_TRACEEXEC is not set), and also to inject any signal (with PTRACE_CONT for instance). However, it does not allow to kill or to pass signal to a process stopped by something else than a real signal (execve when PTRACE_O_TRACEEXEC is set, fork, vfork...).There is already a fix for syscall reports (special path in ptrace_induce_signal()) which allows PTRACE_KILL to work in that case, but not to pass another signal to the tracee (it would be intercepted by ptrace). Maybe we should extend this fix to other kind of report, to allow at least to kill processes in all cases. I already buit and shortly test such a patch. We may want to open a new bugzilla for that. By the way, is there any reason why we don't always send a normal SIGTRAP (other than fields eventmsg and siginfo of ptrace_state structure shared an union)? That would solve all our signal problems.
The goal here is to match the vanilla upstream kernel ptrace functionality exactly. That is what applications depend on. Notifications for PTRACE_O_* swallow the signal passed to PTRACE_CONT et al (PTRACE_KILL=PTRACE_CONT,SIGKILL); that is the behavior.
Posted on rhkl: http://post-office.corp.redhat.com/archives/rhkernel-list/2008-October/msg00507.html
Hi all, Good news, I tested the patched kernel, and the problem has been fixed. Thank you very much for all your efforts in fixing this very high priority bug. With that, it would be great to have this fix available to the public as soon as possible. More details below. Thanks, Seppo XXXXXXXXXXXXXXXXXXXXXXXXXXX Kernel files: kernel-2.6.18-119.el5.bz455060.src.rpm kernel-debuginfo-2.6.18-119.el5.bz455060.x86_64.rpm kernel-2.6.18-119.el5.bz455060.x86_64.rpm kernel-debuginfo-common-2.6.18-119.el5.bz455060.x86_64.rpm kernel-debug-2.6.18-119.el5.bz455060.x86_64.rpm kernel-devel-2.6.18-119.el5.bz455060.x86_64.rpm kernel-debug-debuginfo-2.6.18-119.el5.bz455060.x86_64.rpm kernel-doc-2.6.18-119.el5.bz455060.noarch.rpm kernel-debug-devel-2.6.18-119.el5.bz455060.x86_64.rpm kernel-headers-2.6.18-119.el5.bz455060.x86_64.rpm # test patched kernel (and also using new gcc 43 compilers, not # relevant for this bug) ssh rhel52-x8664-kernelupdate rhel52-x8664-kernelupdate:/home/seppo > rpm -q glibc glibc-2.5-24 glibc-2.5-24 rhel52-x8664-kernelupdate:/home/seppo > cat /etc/*-rel* cat: /etc/lsb-release.d: Is a directory Red Hat Enterprise Linux Server release 5.2 (Tikanga) rhel52-x8664-kernelupdate:/home/seppo > uname -a Linux rhel52-x8664-kernelupdate.totalviewtech.com 2.6.18-119.el5.bz455060 #1 SMP Tue Oct 14 16:10:42 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux rhel52-x8664-kernelupdate:/home/seppo > rhel52-x8664-kernelupdate:/home/seppo > gcc43 -v Using built-in specs. Target: x86_64-redhat-linux6E Configured with: ../configure --prefix=/usr --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap --enable-shared --enable-threads=posix --enable-checking=release --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions --enable-languages=c,c++,fortran --disable-libgcj --with-cpu=generic --build=x86_64-redhat-linux6E Thread model: posix gcc version 4.3.2 20081007 (Red Hat 4.3.2-7) (GCC) rhel52-x8664-kernelupdate:/home/seppo > # Compile your executable gcc43 -g -static -o simplestat_g.out simple.c -lm # Compile "Debugger" gcc43 -o a_patch.out test_TV.c ./a_patch.out rhel52-x8664-kernelupdate:/home/seppo/Bugs/Bug_11153 > ./a_patch.out CHILD: PTRACE_TRACEME at 0 :: return code 0 PARENT: WAIT status 1407 from PID 5598 PARENT: status -> CHILD stopped, by signal 5 PARENT: Sent PTRACE_KILL to 5598 :: return code 0 FINISHED rhel52-x8664-kernelupdate:/home/seppo/Bugs/Bug_11153 > => patched kernel 2.6.18-119.el5.bz455060 works OK
in kernel-2.6.18-121.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Reproduced and verified on x86_64 and ppc64 archs: -- 2.6.18-92.el5 (x86_64) [root@dell-pe1800-01 test2]# ./a.out CHILD: PTRACE_TRACEME at 0 :: return code 0 PARENT: WAIT status 1407 from PID 2913 PARENT: status -> CHILD stopped, by signal 5 PARENT: Sent PTRACE_KILL to 2913 :: return code 0 FINISHED counter 0 [root@dell-pe1800-01 test2]# counter 1 counter 2 counter 3 counter 4 counter 5 counter 6 counter 7 counter 8 counter 9 [root@dell-pe1800-01 test2]# -- 2.6.18-92.el5 (ppc64) [root@ibm-qs22-01 test2]# ./a.out CHILD: PTRACE_TRACEME at 0 :: return code 0 PARENT: WAIT status 1407 from PID 4706 PARENT: status -> CHILD stopped, by signal 5 PARENT: Sent PTRACE_KILL to 4706 :: return code 0 FINISHED [root@ibm-qs22-01 test2]# counter 0 counter 1 counter 2 counter 3 counter 4 counter 5 counter 6 counter 7 counter 8 counter 9 [root@ibm-qs22-01 test2]# -- 2.6.18-121.el5(x86_64) [root@dell-pe1800-01 test2]# ./a.out CHILD: PTRACE_TRACEME at 0 :: return code 0 PARENT: WAIT status 1407 from PID 8404 PARENT: status -> CHILD stopped, by signal 5 PARENT: Sent PTRACE_KILL to 8404 :: return code 0 FINISHED [root@dell-pe1800-01 test2]# -- 2.6.18-121.el5(ppc64) [root@ibm-qs22-01 test2]# CHILD: PTRACE_TRACEME at 0 :: return code 0 PARENT: WAIT status 1407 from PID 9845 PARENT: status -> CHILD stopped, by signal 5 PARENT: Sent PTRACE_KILL to 9845 :: return code 0 FINISHED [root@ibm-qs22-01 test2]#
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html