Description of problem: Running gdb testsuite crashes the machine. Only with kernel patched by the attached patch, though - so feel free to ignore this bugreport. I believe the patch is innocent and just enables the full functionality of ptrace/utrace Linux kernel debugging for gdb. The attachde patch should be obsoleted/contained in kernel-2.6.17-1.2633 as announced in Bug 205179. Version-Release number of selected component (if applicable): kernel-2.6.17-1.2630_jkratoch0.ia64 gdb-6.5-7.src How reproducible: It crashed 3 times - each time I tried it. Steps to Reproduce: 1. Install gdb-6.5-7.src.rpm 2. (possibly not needed) patch gdb.spec <http://cvs.jankratochvil.net/viewcvs/nethome/src/gdb.spec-debug.patch?rev=HEAD 3. rpmbuild -bc gdb.spec 4. cd gdb-6.5/gdb/testsuite 5. make check Actual results: Kernel crash. Expected results: No kernel crash, just userland testsuite PASSes/FAILs. Additional info: It was crashing while executing "gdb.base/checkpoint.exp". Removing "gdb.base/checkpoint.exp" testcase made the testsuite non-crashing. Running just specifically "gdb.base/checkpoint.exp" did not crash it, though.
Created attachment 136560 [details] oops log - Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b
Created attachment 136561 [details] Enable full debugging, should be obsoleted by Bug 205179
Crash again during gdb testsuite: Running ../.././gdb/testsuite/gdb.base/checkpoint.exp ... occured for me today again - but on x86_64. So it is not architecture specific. I do not have the crash dump, though.
BTW this time it was stock RawHide: kernel-2.6.18-1.2689.fc6.x86_64
And so far it appears to me the crash is on gdb/testsuite/gdb.base/chng-syms.exp - recompiling of the executable binary while it is still running (and being debugged by gdb).
Looks like a use-after-free. Note that current rawhide has slab debugging disabled, so it won't show up in this manner any more (it may even hide the crash).
Running this now with the latest nightly (20061012) + SLAB debug on. Just to confirm: This is arch independent. I am running on an 4p x86_64. 1. Build & install kernel + SLAB debug. 2. Build & install gdb 3. Run gdb/testsuite/gdb.base/chng-syms.exp or gdb/testsuite/gdb.base/checkpoint.exp P.
In my case several of the tests in gdb/testsuite seem to be causing slab corruption... P.
FYI, I am seeing more than just slab corruption. On my 4 way ia64 box running 2.6.18-1.2726.el5 the test just fails: Running ../../../gdb/testsuite/gdb.base/checkpoint.exp ... ERROR: internal buffer is full. (then it continues with the suite) However, more interesting is on kona (24 cpu ia64) I get a panic, this is also 2.6.18-1.2726.el5: Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b ksoftirqd/7[24]: Oops 11012296146944 [1] Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 vfat fat dm_multipath button parport_pc lp parport sr_mod cdrom sg tg3 dm_snapshot dm_zero dm_mirror dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd Pid: 24, CPU 7, comm: ksoftirqd/7 psr : 0000121008526010 ifs : 8000000000000389 ip : [<a0000001000a8591>] Not tainted ip is at __rcu_process_callbacks+0x351/0x5a0 unat: 0000000000000000 pfs : 0000000000000389 rsc : 0000000000000003 rnat: 0000000000000000 bsps: 0000000000000000 pr : 0000000000006581 ldrs: 0000000000000000 ccv : 0000000000000003 fpsr: 0009804c8a70433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000a85b0 b6 : a0000001000f03a0 b7 : a00000010000bf00 f6 : 1003e6b6b6b6b6b6b6b6b f7 : 1003e0000000000000050 f8 : 1003e000000000000004c f9 : 10005a000000000000000 f10 : 1000ab951c71c67939fd3 f11 : 1003e0000000000000b95 r1 : a000000100c15040 r2 : 000000000000001b r3 : 000000000000000a r8 : 0000000000000006 r9 : e000070400076b08 r10 : 6b6b6b6b6b6b6b6b r11 : 0000000000000020 r12 : e00000010e50fd50 r13 : e00000010e508000 r14 : e00007078c969b08 r15 : e000070400076af8 r16 : e0000787fae95174 r17 : 0000000000000006 r18 : 000000000000001b r19 : e00000010f054440 r20 : 00000000000040c0 r21 : 0000000000000010 r22 : e0000787fae95270 r23 : e0000787fae95250 r24 : 000000000000001b r25 : 0000000000004000 r26 : 0000000000004000 r27 : e00006fcfff98000 r28 : e00006fcfff98001 r29 : 000000001c1e3259 r30 : e000070403f94000 r31 : 0000000000001c1e Call Trace: [<a000000100014140>] show_stack+0x40/0xa0 sp=e00000010e50f8e0 bsp=e00000010e5093d8 [<a000000100014a40>] show_regs+0x840/0x880 sp=e00000010e50fab0 bsp=e00000010e509380 [<a000000100037e20>] die+0x1c0/0x2a0 sp=e00000010e50fab0 bsp=e00000010e509338 [<a00000010062aa20>] ia64_do_page_fault+0x8a0/0x9e0 sp=e00000010e50fad0 bsp=e00000010e5092e8 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e00000010e50fb80 bsp=e00000010e5092e8 [<a0000001000a8590>] __rcu_process_callbacks+0x350/0x5a0 sp=e00000010e50fd50 bsp=e00000010e5092a0 [<a0000001000a8820>] rcu_process_callbacks+0x40/0xa0 sp=e00000010e50fd50 bsp=e00000010e509280 [<a000000100087c30>] tasklet_action+0x1d0/0x340 sp=e00000010e50fd50 bsp=e00000010e509258 [<a000000100086e10>] __do_softirq+0xf0/0x240 sp=e00000010e50fd50 bsp=e00000010e5091d8 [<a000000100086fd0>] do_softirq+0x70/0xc0 sp=e00000010e50fd50 bsp=e00000010e509178 [<a000000100087710>] ksoftirqd+0x110/0x280 sp=e00000010e50fd50 bsp=e00000010e509150 [<a0000001000aeae0>] kthread+0x220/0x2a0 sp=e00000010e50fd50 bsp=e00000010e509108 [<a0000001000126b0>] kernel_thread_helper+0x30/0x60 sp=e00000010e50fe30 bsp=e00000010e5090e0 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e00000010e50fe30 bsp=e00000010e5090e0 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20 in_atomic():1, irqs_disabled():0 Call Trace: [<a000000100014140>] show_stack+0x40/0xa0 sp=e00000010e50f8c0 bsp=e00000010e509498 [<a0000001000141d0>] dump_stack+0x30/0x60 sp=e00000010e50fa90 bsp=e00000010e509480 [<a000000100066270>] __might_sleep+0x1b0/0x1e0 sp=e00000010e50fa90 bsp=e00000010e509458 [<a0000001000b6060>] down_read+0x20/0x60 sp=e00000010e50fa90 bsp=e00000010e509438 [<a00000010009fc40>] blocking_notifier_call_chain+0x20/0x80 sp=e00000010e50fa90 bsp=e00000010e509400 [<a00000010007c550>] profile_task_exit+0x30/0x60 sp=e00000010e50fa90 bsp=e00000010e5093d8 [<a0000001000808c0>] do_exit+0x40/0x1460 sp=e00000010e50fa90 bsp=e00000010e509380 [<a000000100037ee0>] die+0x280/0x2a0 sp=e00000010e50fab0 bsp=e00000010e509338 [<a00000010062aa20>] ia64_do_page_fault+0x8a0/0x9e0 sp=e00000010e50fad0 bsp=e00000010e5092e8 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280 sp=e00000010e50fb80 bsp=e00000010e5092e8 [<a0000001000a8590>] __rcu_process_callbacks+0x350/0x5a0 sp=e00000010e50fd50 bsp=e00000010e5092a0 [<a0000001000a8820>] rcu_process_callbacks+0x40/0xa0 sp=e00000010e50fd50 bsp=e00000010e509280 [<a000000100087c30>] tasklet_action+0x1d0/0x340 sp=e00000010e50fd50 bsp=e00000010e509258 [<a000000100086e10>] __do_softirq+0xf0/0x240 sp=e00000010e50fd50 bsp=e00000010e5091d8 [<a000000100086fd0>] do_softirq+0x70/0xc0 sp=e00000010e50fd50 bsp=e00000010e509178 [<a000000100087710>] ksoftirqd+0x110/0x280 sp=e00000010e50fd50 bsp=e00000010e509150 [<a0000001000aeae0>] kthread+0x220/0x2a0 sp=e00000010e50fd50 bsp=e00000010e509108 [<a0000001000126b0>] kernel_thread_helper+0x30/0x60 sp=e00000010e50fe30 bsp=e00000010e5090e0 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e00000010e50fe30 bsp=e00000010e5090e0 Kernel panic - not syncing: Aiee, killing interrupt handler!
I cannot do any kernel debugging on ia64 myself, so it is most helpful to concentrate on reproducing the problem on x86_64, i386, or ppc64.
*** This bug has been marked as a duplicate of 210249 ***
Jan, can you verify that the problem persists on FC6? I have a fix for a problem that I think could have this failure mode. I would like to verify a specific kernel where you see the problem reliably, and then have you try a kernel with no differences but my fix. If the problem is masked by having CONFIG_DEBUG_SLAB disabled, we can try a rebuild of a current FC6 kernel with just those config options changed (kernel-2.6.18-1.2798.fc6 or kernel-2.6.18-1.2814.fc6).
Roland, (In reply to comment #14) > Jan, can you verify that the problem persists on FC6? > I have a fix for a problem that I think could have this failure mode. > I would like to verify a specific kernel where you see the problem reliably, and > then have you try a kernel with no differences but my fix. Could you attach the fix please? I'm also looking at 210706 which is very similar to this BZ ... P. > If the problem is masked by having CONFIG_DEBUG_SLAB disabled, we can try a > rebuild of a current FC6 kernel with just those config options changed > (kernel-2.6.18-1.2798.fc6 or kernel-2.6.18-1.2814.fc6).
> Jan, can you verify that the problem persists on FC6? Roland, I just verified that I can see this issue on the FC6 kernel on kona1, a 24-way ia64 box. > I have a fix for a problem that I think could have this failure mode. > I would like to verify a specific kernel where you see the problem reliably, and > then have you try a kernel with no differences but my fix. I can install the FC6 kernel, kernel-2.6.18-1.2798, and installing gdb sources and running the following script from /usr/src/redhat/BUILD/gdb-6.5/gdb/testsuite while true; do sleep 1; runtest gdb.base/checkpoint.exp gdb.base/chng-syms.exp; done (Warning: It can take up to 3 hours for the problem to occur...) P.
kernel-2.6.18-1.2798.rm2.fc6.x86_64 booted for me but it crashed for the gdb testsuite on the 7th run. As Prarit noted the missing tty locking changes I did not try to get a kdump. Serial console unfortunately not available there.
Roland, With your patch I hit a new issue: kona1.lab.boston.redhat.com login: [ 342.380072] kernel BUG at kernel/utrace.c:947! [ 342.380302] checkpoint[4528]: bugcheck! 0 [1] [ 342.380496] Modules linked in: hidp(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) vfat(U) fat(U) dm_multipath(U) button(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) sg(U) tg3(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) [ 342.382158] [ 342.382160] Pid: 4528, CPU 8, comm: checkpoint [ 342.382367] psr : 00001010085a6010 ifs : 800000000000040b ip : [<a0000001000ee980>] Not tainted [ 342.382644] ip is at utrace_quiescent+0x2c0/0x6e0 [ 342.382785] unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 [ 342.383002] rnat: 0000000000000000 bsps: 0000000000000000 pr : 000000000059955b [ 342.383221] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f [ 342.383452] csd : 0000000000000000 ssd : 0000000000000000 [ 342.383689] b0 : a0000001000ee980 b6 : a000000100011300 b7 : a00000010000bf20 [ 342.383994] f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf [ 342.384256] f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 [ 342.384502] f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db [ 342.384755] r1 : a000000100c03c90 r2 : a000000100a1b4c0 r3 : e000070781a19084 [ 342.385403] r8 : 0000000000000035 r9 : a000000100a197b8 r10 : a000000100a1b4f0 [ 342.385729] r11 : a000000100a1b4f0 r12 : e000070781a1fd50 r13 : e000070781a18000 [ 342.386008] r14 : a000000100a1b4c0 r15 : 0000000000000000 r16 : ffffffffdead4ead [ 342.386635] r17 : 00000000dead4ead r18 : a000000100949afc r19 : a000000100a197b0 [ 342.387400] r20 : 0000000000000000 r21 : a000000100a04310 r22 : 0000000000000004 [ 342.388126] r23 : a000000100855180 r24 : a000000100a04310 r25 : a000000100a1b4c8 [ 342.388814] r26 : a000000100a1b4c8 r27 : e0000707f92d9070 r28 : e0000707f92d8008 [ 342.389430] r29 : e0000787f30a0060 r30 : e0000707f92d802c r31 : e0000787f30a002c [ 342.390168] [ 342.390169] Call Trace: [ 342.390614] [<a000000100013ea0>] show_stack+0x40/0xa0 [ 342.390617] sp=e000070781a1f8e0 bsp=e000070781a194c0 [ 342.391863] [<a0000001000147a0>] show_regs+0x840/0x880 [ 342.391866] sp=e000070781a1fab0 bsp=e000070781a19468 [ 342.415578] [<a000000100037b80>] die+0x1c0/0x2a0 [ 342.415580] sp=e000070781a1fab0 bsp=e000070781a19420 [ 342.416702] [<a000000100037cb0>] die_if_kernel+0x50/0x80 [ 342.416705] sp=e000070781a1fad0 bsp=e000070781a193e8 [ 342.417162] [<a00000010061dc50>] ia64_bad_break+0x270/0x4a0 [ 342.417164] sp=e000070781a1fad0 bsp=e000070781a193c0 [ 342.417579] [<a00000010000c720>] __ia64_leave_kernel+0x0/0x280 [ 342.417581] sp=e000070781a1fb80 bsp=e000070781a193c0 [ 342.417993] [<a0000001000ee980>] utrace_quiescent+0x2c0/0x6e0 [ 342.417995] sp=e000070781a1fd50 bsp=e000070781a19368 [ 342.418399] [<a0000001000f1140>] utrace_get_signal+0xa60/0xac0 [ 342.418402] sp=e000070781a1fd50 bsp=e000070781a19308 [ 342.439207] [<a00000010009c600>] get_signal_to_deliver+0x1e0/0x740 [ 342.439210] sp=e000070781a1fd80 bsp=e000070781a192b8 [ 342.439636] [<a000000100034650>] ia64_do_signal+0x90/0xde0 [ 342.439638] sp=e000070781a1fd80 bsp=e000070781a191d0 [ 342.440487] [<a000000100013d20>] do_notify_resume_user+0x100/0x160 [ 342.440489] sp=e000070781a1fe20 bsp=e000070781a191a0 [ 342.440905] [<a00000010000cc00>] notify_resume_user+0x40/0x60 [ 342.440908] sp=e000070781a1fe20 bsp=e000070781a19150 [ 342.441319] [<a00000010000cb30>] skip_rbs_switch+0xe0/0x110 [ 342.441321] sp=e000070781a1fe30 bsp=e000070781a19150 [ 342.441740] [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 [ 342.441743] sp=e000070781a20000 bsp=e000070781a19150 [ 342.442172]
Created attachment 139632 [details] kernel-2.6.18-1.2798.rm3.fc6.ia64 oops log kernel-2.6.18-1.2798.rm3.fc6.x86_64: rpmbuild --rebuild gdb-6.5-11.el5.src.rpm crash on the 2nd run (no serial console there, kdump not loadable) kernel-2.6.18-1.2798.rm3.fc6.ia64: rpmbuild --rebuild gdb-6.5-11.el5.src.rpm crash on the 2nd run It looks as the fdtable crash in Bug 210706 Comment 16: [Full dump attached.] [<a000000100144900>] free_block+0x100/0x300 sp=e00007307bbf7960 bsp=e00007307bbf17b8 [<a000000100145040>] __drain_alien_cache+0xc0/0x100 sp=e00007307bbf7960 bsp=e00007307bbf1778 [<a0000001001443d0>] kmem_cache_free+0x390/0x580 sp=e00007307bbf7960 bsp=e00007307bbf1738 [<a000000100196250>] free_fdtable_rcu+0xb0/0x2a0 sp=e00007307bbf7970 bsp=e00007307bbf1708
> > kernel-2.6.18-1.2798.rm3.fc6.ia64: > rpmbuild --rebuild gdb-6.5-11.el5.src.rpm crash on the 2nd run > It looks as the fdtable crash in Bug 210706 Comment 16: Still seeing a fdtable panic with Roland's latest patch ... I'll look into this further in the AM. > [Full dump attached.] > [<a000000100144900>] free_block+0x100/0x300 > sp=e00007307bbf7960 bsp=e00007307bbf17b8 > [<a000000100145040>] __drain_alien_cache+0xc0/0x100 > sp=e00007307bbf7960 bsp=e00007307bbf1778 > [<a0000001001443d0>] kmem_cache_free+0x390/0x580 > sp=e00007307bbf7960 bsp=e00007307bbf1738 > [<a000000100196250>] free_fdtable_rcu+0xb0/0x2a0 > sp=e00007307bbf7970 bsp=e00007307bbf1708 >
Hey Prarit, Any update on the fdtable?
Is there any progress on the issue
We saw the following oops on rhel5 utrace code BUG: unable to handle kernel paging request at virtual address 7ca1c291 EIP is at utrace_get_signal+0x46/0x477 get_signal_to_deliver+0xdf/0x3b1 do_notify_resume+0xa9/0x6a5 audit_syscall_exit+0x285/0x2a1 work_notifysig+0x13/0x19 copy_to_user_policy+0x73/0x7f The failing IP corresponds to code in utrace_get_signal(): int utrace_get_signal(struct task_struct *tsk, struct pt_regs *regs, siginfo_t *info, struct k_sigaction *return_ka) { struct utrace *utrace = tsk->utrace; ... if (utrace->u.live.signal != NULL) { signal.signr = utrace->u.live.signal->signr; copy_siginfo(info, utrace->u.live.signal->info); ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bogus pointer was supplied here. How ->utrace assignment should be handled correctly?
I'm sorry to bother you guys again, but are these utrace LOCALLY *EXPLOITABLE* issues going to be fixed?
Created attachment 159845 [details] utrace_get_signal() oops reproducer
Comment 31, Comment 32 and Comment 33 have been moved into a new Bug 312951 as it is a different problem.
Comment on attachment 159845 [details] utrace_get_signal() oops reproducer Moved into Bug 312951.
Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers
This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed.