Upon typing 'run' into gdb to start a program... ------------[ cut here ]------------ kernel BUG at arch/powerpc/kernel/ptrace.c:71! Oops: Exception in kernel mode, sig: 5 [#2] SMP NR_CPUS=128 NUMA PowerMac Modules linked in: bnep bridge nls_utf8 loop hidp vfat fat nfs lockd nfs_acl autofs4 rfcomm l2cap sunrpc tun ipv6 dm_mirror dm_mod snd_usb_audio snd_aoa_codec_tas snd_usb_lib snd_rawmidi snd_hwdep snd_powermac snd_aoa_fabric_layout snd_aoa snd_seq_dummy snd_aoa_i2sbus snd_aoa_soundbus snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device pmac_zilog snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_timer snd soundcore usb_storage ide_cd cdrom firewire_ohci firewire_core crc_itu_t hci_usb bluetooth sungem sungem_phy sg shpchp sata_svw ata_generic libata sd_mod scsi_mod ext3 jbd mbcache uhci_hcd ohci_hcd ehci_hcd NIP: c00000000000aefc LR: c0000000000d2890 CTR: c00000000000aeac REGS: c000000112fbb890 TRAP: 0700 Tainted: G D (2.6.23-0.185.rc6.git7.fc8) MSR: 9000000000029032 <EE,ME,IR,DR> CR: 22044488 XER: 20000000 TASK = c00000011bf612c0[742] 'gdb' THREAD: c000000112fb8000 CPU: 0 GPR00: 0000000000000001 c000000112fbbb10 c000000000738ea0 fffffffffffffffb GPR04: c000000000669950 0000000000000000 0000000000000008 0000000000000000 GPR08: 00000fffffcd5450 c00000002d1fbea0 0000000000000000 c00000000000b1ec GPR12: 0000000028044448 c000000000659180 00000000105f72c8 00000000105f5b30 GPR16: 00000fffffcd5b38 c00000006c760000 c000000000669920 0000000000000000 GPR20: c000000112fb8000 c000000066902960 0000000000000000 c00000000044a480 GPR24: 0000000000000000 0000000000000000 c000000066902960 00000fffffcd5450 GPR28: 0000000000000000 0000000000000000 0000000000000000 0000000000000008 NIP [c00000000000aefc] .genregs_get+0x50/0x340 LR [c0000000000d2890] .ptrace_layout_access+0x2a8/0x348 Call Trace: [c000000112fbbb10] [c0000000006e79a8] netlbl_cipsov4_genl_c_listall+0xebc0/0x1da20 (unreliable) [c000000112fbbbc0] [c0000000000d2890] .ptrace_layout_access+0x2a8/0x348 [c000000112fbbcb0] [c000000000009db8] .arch_ptrace+0x11c/0x2f4 [c000000112fbbd70] [c0000000000d31bc] .sys_ptrace+0x88/0x2f0 [c000000112fbbe30] [c000000000008548] syscall_exit+0x0/0x40 Instruction dump: f821ff51 7c7a1b78 7cbe2b78 7cdf3378 7cfc3b78 7d1b4378 e9230380 3860fffb 2fa90000 419e02d4 e8090140 780007e0 <0b000000> 2fa60000 e89a0380 38000000
I couldn't reproduce this using the 2.6.23-0.185.rc6.git7.fc8.ppc64 kernel on an F7 install. Can you verify a few details while I try to get an F8 install going? gdb rpm version & arch what binary did you run under gdb, i.e. 32 or 64 bit?
Further investigation shows that it happens when my _gdb_ is 64-bit, not when it's 32-bit. That was tested with 6.6-28. It happens when starting _any_ 32-bit or 64-bit program that I've tried so far. pmac /home/dwmw2 $ gdb foo GNU gdb Red Hat Linux (6.6-27.fc8rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type "show copying" to see the conditions. There is absolutely no warranty for GDB. Type "show warranty" for details. This GDB was configured as "ppc64-redhat-linux-gnu"... Using host libthread_db library "/lib64/libthread_db.so.1". (gdb) run Starting program: /home/dwmw2/foo Trace/breakpoint trap
I got rawhide installed, and 2.6.23-0.195.rc7.git3.fc8.ppc64 is the kernel I actually have now. With gdb-6.6-28.fc8.ppc64, I tried "gdb /bin/true" (32-bit) and "gdb /usr/bin/gdb" (64-bit) and just "run" in each worked fine. After updating my mirror so I could make the rawhide install work, I no longer have the older 0.185 kernel on hand to try with this install. But perhaps the problem went away. Can you try the latest rawhide kernel?
file /usr/bin/gdb
I really meant it when I said gdb-6.6-28.fc8.ppc64, and yes, I double-checked.
If this problem does persist for you, is the oops non-total enough that you can get sysrq-t and see the backtrace of the ptrace target task?
Hm, OK -- RPM behaviour changed recently and if you have both 32-bit and 64-bit packages installed simultaneously now, your /usr/bin/gdb will be 32-bit. I'm using a local build of the 0.195 kernel with some extra debugging added, and it persists here. I can't get a backtrace of the target task (without extra hacking) because it exits immediately. GDB quits with the BUG(), and the inferior immediately dies with SIGTRAP. With the patch at http://david.woodhou.se/ptrace-hack.patch I see the following output when I use 64-bit GDB... the first four invocations with 'regs partial' would have caused the BUG() before my patch. genregs_get 0-7, regs partial genregs_get 256-263, regs partial genregs_get 0-7, regs partial genregs_get 256-263, regs partial genregs_get 0-7, regs full genregs_get 256-263, regs full genregs_get 0-7, regs full genregs_get 256-263, regs full genregs_get 0-7, regs full genregs_get 256-263, regs full genregs_get 0-7, regs full genregs_get 256-263, regs full I do see this on ppc32 kernels too -- but the CHECK_FULL_REGS() test doesn't cause a BUG() there -- it just prints a message. In fact I think I've seen it on ppc32 for a while, but only ever when I've been busy chasing something else in gdb (go figure).
Yeah, that's how the install came (two gdb rpms, the wrong one won). But I'm highly suspicious of such things from many past horrors, so I checked and rediddled it correctly on autopilot before I even began. Can you verify that it happens for you with some exact binary that I have? 14bd16ccc6c6da555ea914a2edb42164 /bin/true from coreutils-6.9-6.fc8.ppc, e.g. If our gdbs and kernels and all match, then perhaps it is a hardware variation that uses different kernel code or something? My machine is a dual-G5 Mac. With your debugging hack, it should be easy to stick in show_stack(current,NULL);show_stack(target,NULL); in the partial regs case. The target's backtrace will probably clue me in.
Yes, I have exactly the same /bin/true binary, and it happens there too. About to reboot with the show_stack() calls. My machine is also a dual-G5 Mac. Would be interesting to see what output _you_ get with my debugging code.
Another thing I should have suggested for more possibly-informative spew is #define DEBUG 1 at the top of kernel/ptrace.c. In my runs I think we can be pretty sure your code would never say "partial", because it would hit the BUG_ON in the kernel I'm testing.
Created attachment 204051 [details] Debug output
The above debug output already has DEBUG defined in kernel/ptrace.c It seems that these 'partial' reg sets all happen when the inferior is still in utrace_quiescent(), before it's even started up. But that's strange. When we enter sys_fork() or sys_clone(), we _do_ save the full registers -- and copy_thread() _checks_ that -- you'd have got a BUG() if you tried to clone without having the full registers present to copy into the child. Dunno why your system would be different to mine. Do you have auditd running? That could well force us to save a full regset all of the time, although given my previous paragraph I still don't quite see how that would help. You don't have a 'special' way of forking the inferior for utrace, do you? And somehow manage to create the new thread without tripping over the same BUG() happening in copy_thread()? Confused...
So it's the exec report. D'oh, I looked at this code before as the top suspect but misread arch/powerpc/kernel/process.c:start_thread. I think this could happen in an upstream kernel when using PTRACE_O_TRACEEXEC (which gdb might not use, but from the traces it looks like it does). AFAICT, start_thread never clears regs->trap after it's clobbered all the registers. I can't see why it would be different on my machine than on yours. Ah, the upstream kernel has the CHECK_FULL_REGS BUG_ON for PEEKUSR but not GETREGS, which is what gdb probably uses now. A case using PTRACE_O_TRACEEXEC, the child exec'ing so it's stopped in ptrace_notify, and then PTRACE_PEEKUSR to fetch a register should hit the oops in the upstream kernel. The upstream kernel should have CHECK_FULL_REGS for its GETREGS/SETREGS requests too, they were probably just omitted accidentally when those were added/revamped recently. Then it would crash with gdb too, and the right fix is for start_thread to clear regs->trap to 0. But I could be missing something about all this since I still have no explanation for why my machine does not hit the bug.
I did have auditd running (default in the Fedora install). After chkconfig auditd off; reboot, I do get the oops.
Created attachment 204101 [details] standalone test case This test case makes the upstream kernel crash too.
As expected, with 'regs->trap &= ~1;' added to start_kernel, I see 'regs full' in all cases in my debugging output.
Good. I have tested patches for upstream doing that, and will write them up and submit them in the morning (nearing dawn here now), as well as put the fix into rawhide pending upstream inclusion. It turns out that the way gdb uses the features will never provoke the bug on the upstream kernel (but my comment 15 test case will). utrace does some things more uniformly and so trips this underlying bug in more cases.
Thanks. Your patch is probably a candidate for 2.6.23.
gdb --args /bin/sh -c 'exec /bin/true' (and "run") is also a sufficient test case for the upstream bug.
2.6.23-0.202.rc8.fc8 fixes it