Description of problem: Behavior change from the upstream. It affects the method I found out as the minimal/reliable way to attach unstopped+stopped processes on upstream kernels: http://www.jankratochvil.net/priv/ver5.txt * kill(SIGSTOP) the child * waitid() (waitpid()) on it to catch its SIGSTOP notification * PTRACE_ATTACH * tkill(SIGCONT) * waitpid() - should not hang Version-Release number of selected component (if applicable): kernel-2.6.21-1.3228.fc7.x86_64 How reproducible: Always. Steps to Reproduce: 1. gcc -o stopped-sigcont-hang-0 stopped-sigcont-hang-0.c -Wall -ggdb2 2. ./stopped-sigcont-hang-0 Actual results: Aborted (due to a timeout on hung waitpid()) Expected results: PASS Additional info: PASS is produced on upstream 2.6.22-rc4-git7.x86_64. kernel-2.6.23-0.15.rc0.git1.fc8.x86_64 unfortunately does not boot for me (on QEMU; F7 3228.fc7 boots there). It is a blocker for pushing the reliable stopped+unstopped PTRACE_ATTACH for F8 GDB. Not that the stopped-processes attachment would be so serious for F8. Maybe one should use PTRACE_CONT instead of tkill(), it fails (races) on RHEL-4 kernels but a goal of coding ptrace() client compatible with all the kernels around may be too ambitious.
Created attachment 159411 [details] Testcase.
Was this fixed in 2.6.22.1-41.fc7? (But we are shipping 2.6.23 for F8...)
Not fixed in: kernel-2.6.22.1-41.fc7.x86_64 kernel-2.6.23-0.129.rc3.git4.fc8.x86_64
Reproduced on development utrace vs vanilla upstream du jour, looking into it.
Created attachment 179961 [details] draft kernel patch to fix the bug Jan, you build your own test kernels, right? Can you test this kernel patch for regressions on all your past test cases and gdb and so forth? It fixes the test case for me, and I think it's correct. But it's a touchy area and I'd like to get some regression testing for the other problem cases impinging on this code before committing it.
Unfortunately it has a regression for: Bug 232837 Comment 10 (Frysk frysk4217/attachstop.c) kernel-2.6.22.4-65.fc7.x86_64: FAIL kernel-2.6.23-0.152.rc4.bz248532c5.fc7.x86_64: FAIL (patched by Comment 5) kernel-2.6.21-1.3228.fc7.x86_64: PASS (reverified now)
I see now it may not be a regression of this specific patch, I should patch 2.6.21-1.3228.fc7 by the patch from Comment 5; still your comment would be useful.
Created attachment 181161 [details] New testcase. The fix is either incomplete or it has a related regression. A new testcase attached; as the problem is not 100% reliable so there must be some looping, but the failure occurs in about 25% of the loop iterations. (The whole testcase program is 100% reliable and runs under 1 second.) The problem (process 12118 is singlethreaded, SIGSTOPped and it has not been waited on before): ptrace(PTRACE_ATTACH, 12118, 0, 0) = 0 tkill(12118, SIGCONT) = 0 wait4(12118, [{WIFSTOPPED(s) && WSTOPSIG(s) == SIGSTOP}], 0, NULL) = 12118 ptrace(PTRACE_PEEKUSER, 12118, 8*R15, [0]) = -1 ESRCH (No such process) It never occurs on upstream 2.6.22-rc4-git7.x86_64.
ESRCH happens there only in the case of some race between raise(SIGSTOP) delivery in the inferior process and PTRACE_ATTACH. With a delay between raise(SIGSTOP) and PTRACE_ATTACH no problem occurs. Still this problem is not hit on kernel.org kernels.
Created attachment 182321 [details] New testcase. utrace currently sometimes returns SIGSTOP even despite earlier tkill(SIGCONT). SIGCONT should clear all the pending SIGSTOP events before. Either the SIGSTOP case of the originally stopped process or the SIGSTOP event from PTRACE_ATTACH, both should get cleared and no longer reported. (Later ESRCH in Comment 8 was just a consequence. SIGCONT got delivered right before the moment PTRACE_PEEKUSER was called. PTRACE_PEEKUSER got processed in a moment before the child processes found out it is under ptrace and stopped again + reported its SIGCONT to its ptrace-parent. The child process was running for a short while after SIGCONT delivery.) Based on info from Roland.
A minor correction: tkill(SIGCONT) clears queued stop signals and the job-control-stopped state (TASK_STOPPED) as left from the child's original SIGSTOP in this test case. If the PTRACE_ATTACH's SIGSTOP were to be processed before tkill, the SIGCONT would not wake it up from its ptrace'd-signal-stopped state (TASK_TRACED) or clear that ptrace-stopped-not-waited-for status, so waitpid in that event would report SIGSTOP, whereas SIGCONT does affect the job-control-stopped-not-waited-for status so waitpid cannot possibly see SIGSTOP. The rest of comment #10 is correct. The utrace bug is that after PTRACE_ATTACH, waitpid will report a previous job-control-stop condition even though SIGCONT has cleared the TASK_STOPPED state.
Created attachment 182381 [details] test case that counts failures, exits success/fail after many iterations I tweaked the test case to keep running more iterations after a failure, and count the number of failures. This version is still a good regression test suite candidate, since it will exit success or failure at the end showing whether it hit any failing iterations. But on reproducing the bug, it gives more of an idea how often it is hitting. Jan reported it hits very quickly, but for me 19 bad in 10000 iterations: 0.19% My kernel configuration may differ a lot (usually it's more racy than production kernels, but the timing is falling such that it hits this issue less often than the Fedora config, perhaps). This is on 4p (2x2core) Xeon 1.86Ghz (x86_64).
Created attachment 185931 [details] Roland's fix for the Comment 12 testcase. From: Roland McGrath <roland> Date: Tue, 4 Sep 2007 01:57:34 -0700 Subject: [PATCH] Fix ptrace_do_wait race with SIGCONT wakeup This fixes ptrace_do_wait to distinguish ptrace stops from job control stops and never report as stopped a task that's neither in a ptrace stop nor in TASK_STOPPED state. Previously, a ptrace'd task in TASK_RUNNING could be reported as stopped when SIGCONT had just woken it (but it had not yet cleared its ->exit_code). Signed-off-by: Roland McGrath <roland> --- kernel/ptrace.c | 41 +++++++++++++++++++++++++++++++++++++++-- 1 files changed, 39 insertions(+), 2 deletions(-)
On kernel 2.6.23-0.152.rc4.bz248532c5c13bz232837c10.fc7, therefore patched by: Bug 248532 Comment 5 (above) Bug 248532 Comment 13 (above) Bug 232837 Comment 10 all the tests worked as expected except that I got one Comment 12 message: ./stopped-sigcont-hang-2roland: Stopped (signal) after 6105 iterations 1 bad in 10000 iterations: 0.01% There was (probably!) running a GDB testsuite in parallel but it should not accedentally signal an unrelated process. Going let it running.
Ack! That should never have been possible (unless you externally sent signals to the test process). So it is of concern that it happened even once. We need to try hard to reproduce that on the current code.
Created attachment 187231 [details] Testcase of Comment 12 just without timeouts/alarm. Got it reproducible. It is enough to run it as: $ nice -n20 ./stopped-sigcont-hang-3 While the machine is under load: $ echo 'int main (void) { for (;;); }' >load.c;gcc -o load load.c -Wall -ggdb2;./load & Reproducibility is pretty high for me (reduced test count): 63 bad in 200 iterations: 31.50% Timeouts had to be disabled there as they sometimes occur under the load.
Comment 16 is still well reproducible on: kernel-2.6.23-0.164.rc5.fc8.x86_64
It is reproducible both on qemu-kvm and on (software emulation) qemu-system-x86_64. Still not reproducible on non-virtualized hardware.
I tried the comment 16 scenario on qemu-kvm on my x86_64 (F7), running rawhide inside the VM. I then used that VM running my development utrace kernel. I reproduce about 10% failure rate with the stopped-sigcont-hang-3.c version of the test case. However, if I change the delay before PTRACE_ATTACH (not changing other DELAY() calls) from 0.1s to 0.5s it goes to 0.50% (1 hit); to 0.7s and it goes to 0. So, are you sure there is really still a regression here? Or are you just seeing the known false-positive from doing PTRACE_ATTACH before the original SIGSTOP got delivered? To be 100% robust against that false positive, the test should check /proc/pid/status for "T" and delay briefly and loop until it sees it. If you concur that the test still tests what we want with that delay before attach, and you do not reproduce a failure excluding those false positives, then I think we really do (already) have it licked.
Created attachment 191741 [details] Raceless testcase from Comment 16. The QEMU/KVM problem was really a false positive, sorry. No longer reproducible on kernel-2.6.23-0.164.rc5.fc8.x86_64 in qemu-kvm. Considering the Comment 0 bugreport as fixed, thanks.