From Bugzilla Helper: User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.2) Gecko/20040805 Netscape/7.2 Description of problem: I have an application that forks a new process which creates 2 child threads that are in an infinite loop. The application PTRACE attaches to the new process and its threads. The main thread sits and waits forever in a pthread_join. From the parent process of the fork, I send a kill (SIGSTOP) to the child process (aka the parent thread of the child threads) and the parent process waits in a poll() call for a SIGCHLD to interrupt it. A SIGCHLD occurs and a loop of waitpid NOHANG gets SIGSTOP events for the main thread and sometimes one of the child threads. All of the child threads are not stopped and this is verified by looking at /proc. It is not a timing issue as the stopped was not delivered after an hour. Version-Release number of selected component (if applicable): kernel-2.6.9-10.ELsmp How reproducible: Always Steps to Reproduce: 1.Compile attached test case: gcc -g TestFailure.c -lpthread 2.Run test case ./a.out 3.Wait for it to hang and then look at /proc/xxxx/task/yyyy/status for all 3 threads listed Actual Results: Only 1, maybe 2 stop events occur after the kill (pid, SIGSTOP). All three threads are not stopped. Expected Results: The SIGSTOP should have been broadcast to the main thread and all child threads. Using tkill to individually stop the threads does work. Additional info:
Created attachment 115173 [details] TestFailure.c test case
Test also fails on FC4: 2.6.11-1.1238_FC4smp
Created attachment 263751 [details] Comment 1 attachment fixed up. Resolved as NOTABUG by a knowledge from Roland: Process under ptrace(2) no longer stops completely by a single SIGSTOP, under ptrace(2) SIGSTOP applies only to a single task of the process group. F8 kernel (kernel-2.6.23.1-42.fc8.x86_64) behaves for me also according to the Comment 0 field `Actual Results' and not according to its `Expected Results'. The F8 kernel behaves right in this respect as it behaves the same as RHEL-4 kernels in this respect and these behave the same se kernel.org/upstream kernels in this respect. kernel.org/upstream kernels define the ptrace(2) behavior according to Roland. :-) Therefore F8 kernels are correct. The ptrace(2) behavior may be scary but there are possibilities how to code the userland to reach any desired goals. Fixed up the testcase to avoid any races in it and to behave according to the Comment 0 field `Expected Results'. The testcase modifications were made DIFF-friendly. My testcase uses tkill(2) as it is the only safe way how to target specific task of a process group by a signal. Even the upstream GDB kill_lwp() implementation uses tkill(2). log: Attaching 21358 stop event 19 on <21358> continuing first stop lwpid is 21358 lwpid is 21359 Attaching 21359 stop event 19 on <21359> continuing first stop lwpid is 21360 Attaching 21360 stop event 19 on <21360> continuing first stop About to kill main thread stop event 19 on <21358> stop event 19 on <21359> stop event 19 on <21360> Manually check to see what threads are stopped. [hang] and now $ grep Stat /proc/{21358,21359,21360}/status /proc/21358/status:State: T (tracing stop) /proc/21359/status:State: T (tracing stop) /proc/21360/status:State: T (tracing stop)