Bug 301791
Summary: | ptrace broke | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | David Woodhouse <dwmw2> | ||||||
Component: | kernel | Assignee: | Roland McGrath <roland> | ||||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||
Severity: | low | Docs Contact: | |||||||
Priority: | low | ||||||||
Version: | rawhide | CC: | cebbert, davej, jan.kratochvil | ||||||
Target Milestone: | --- | ||||||||
Target Release: | --- | ||||||||
Hardware: | ppc64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | 2.6.23-0.202.rc8.fc8 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2007-09-25 07:51:37 UTC | Type: | --- | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Attachments: |
|
Description
David Woodhouse
2007-09-22 19:56:21 UTC
I couldn't reproduce this using the 2.6.23-0.185.rc6.git7.fc8.ppc64 kernel on an F7 install. Can you verify a few details while I try to get an F8 install going? gdb rpm version & arch what binary did you run under gdb, i.e. 32 or 64 bit? Further investigation shows that it happens when my _gdb_ is 64-bit, not when it's 32-bit. That was tested with 6.6-28. It happens when starting _any_ 32-bit or 64-bit program that I've tried so far. pmac /home/dwmw2 $ gdb foo GNU gdb Red Hat Linux (6.6-27.fc8rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ppc64-redhat-linux-gnu"... Using host libthread_db library "/lib64/libthread_db.so.1". (gdb) run Starting program: /home/dwmw2/foo Trace/breakpoint trap I got rawhide installed, and 2.6.23-0.195.rc7.git3.fc8.ppc64 is the kernel I actually have now. With gdb-6.6-28.fc8.ppc64, I tried "gdb /bin/true" (32-bit) and "gdb /usr/bin/gdb" (64-bit) and just "run" in each worked fine. After updating my mirror so I could make the rawhide install work, I no longer have the older 0.185 kernel on hand to try with this install. But perhaps the problem went away. Can you try the latest rawhide kernel? file /usr/bin/gdb I really meant it when I said gdb-6.6-28.fc8.ppc64, and yes, I double-checked. If this problem does persist for you, is the oops non-total enough that you can get sysrq-t and see the backtrace of the ptrace target task? Hm, OK -- RPM behaviour changed recently and if you have both 32-bit and 64-bit packages installed simultaneously now, your /usr/bin/gdb will be 32-bit. I'm using a local build of the 0.195 kernel with some extra debugging added, and it persists here. I can't get a backtrace of the target task (without extra hacking) because it exits immediately. GDB quits with the BUG(), and the inferior immediately dies with SIGTRAP. With the patch at http://david.woodhou.se/ptrace-hack.patch I see the following output when I use 64-bit GDB... the first four invocations with 'regs partial' would have caused the BUG() before my patch. genregs_get 0-7, regs partial genregs_get 256-263, regs partial genregs_get 0-7, regs partial genregs_get 256-263, regs partial genregs_get 0-7, regs full genregs_get 256-263, regs full genregs_get 0-7, regs full genregs_get 256-263, regs full genregs_get 0-7, regs full genregs_get 256-263, regs full genregs_get 0-7, regs full genregs_get 256-263, regs full I do see this on ppc32 kernels too -- but the CHECK_FULL_REGS() test doesn't cause a BUG() there -- it just prints a message. In fact I think I've seen it on ppc32 for a while, but only ever when I've been busy chasing something else in gdb (go figure). Yeah, that's how the install came (two gdb rpms, the wrong one won). But I'm highly suspicious of such things from many past horrors, so I checked and rediddled it correctly on autopilot before I even began. Can you verify that it happens for you with some exact binary that I have? 14bd16ccc6c6da555ea914a2edb42164 /bin/true from coreutils-6.9-6.fc8.ppc, e.g. If our gdbs and kernels and all match, then perhaps it is a hardware variation that uses different kernel code or something? My machine is a dual-G5 Mac. With your debugging hack, it should be easy to stick in show_stack(current,NULL);show_stack(target,NULL); in the partial regs case. The target's backtrace will probably clue me in. Yes, I have exactly the same /bin/true binary, and it happens there too. About to reboot with the show_stack() calls. My machine is also a dual-G5 Mac. Would be interesting to see what output _you_ get with my debugging code. Another thing I should have suggested for more possibly-informative spew is #define DEBUG 1 at the top of kernel/ptrace.c. In my runs I think we can be pretty sure your code would never say "partial", because it would hit the BUG_ON in the kernel I'm testing. Created attachment 204051 [details]
Debug output
The above debug output already has DEBUG defined in kernel/ptrace.c It seems that these 'partial' reg sets all happen when the inferior is still in utrace_quiescent(), before it's even started up. But that's strange. When we enter sys_fork() or sys_clone(), we _do_ save the full registers -- and copy_thread() _checks_ that -- you'd have got a BUG() if you tried to clone without having the full registers present to copy into the child. Dunno why your system would be different to mine. Do you have auditd running? That could well force us to save a full regset all of the time, although given my previous paragraph I still don't quite see how that would help. You don't have a 'special' way of forking the inferior for utrace, do you? And somehow manage to create the new thread without tripping over the same BUG() happening in copy_thread()? Confused... So it's the exec report. D'oh, I looked at this code before as the top suspect but misread arch/powerpc/kernel/process.c:start_thread. I think this could happen in an upstream kernel when using PTRACE_O_TRACEEXEC (which gdb might not use, but from the traces it looks like it does). AFAICT, start_thread never clears regs->trap after it's clobbered all the registers. I can't see why it would be different on my machine than on yours. Ah, the upstream kernel has the CHECK_FULL_REGS BUG_ON for PEEKUSR but not GETREGS, which is what gdb probably uses now. A case using PTRACE_O_TRACEEXEC, the child exec'ing so it's stopped in ptrace_notify, and then PTRACE_PEEKUSR to fetch a register should hit the oops in the upstream kernel. The upstream kernel should have CHECK_FULL_REGS for its GETREGS/SETREGS requests too, they were probably just omitted accidentally when those were added/revamped recently. Then it would crash with gdb too, and the right fix is for start_thread to clear regs->trap to 0. But I could be missing something about all this since I still have no explanation for why my machine does not hit the bug. I did have auditd running (default in the Fedora install). After chkconfig auditd off; reboot, I do get the oops. Created attachment 204101 [details]
standalone test case
This test case makes the upstream kernel crash too.
As expected, with 'regs->trap &= ~1;' added to start_kernel, I see 'regs full' in all cases in my debugging output. Good. I have tested patches for upstream doing that, and will write them up and submit them in the morning (nearing dawn here now), as well as put the fix into rawhide pending upstream inclusion. It turns out that the way gdb uses the features will never provoke the bug on the upstream kernel (but my comment 15 test case will). utrace does some things more uniformly and so trips this underlying bug in more cases. Thanks. Your patch is probably a candidate for 2.6.23. gdb --args /bin/sh -c 'exec /bin/true' (and "run") is also a sufficient test case for the upstream bug. 2.6.23-0.202.rc8.fc8 fixes it |