Bug 207002
| Summary: | crash - mmput()/unmap_vmas() - gdb testsuite | ||
|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Jan Kratochvil <jan.kratochvil> | 
| Component: | kernel | Assignee: | Roland McGrath <roland> | 
| Status: | CLOSED WONTFIX | QA Contact: | Brian Brock <bbrock> | 
| Severity: | high | Docs Contact: | |
| Priority: | medium | ||
| Version: | 6 | CC: | davej, dchapman, dev, dwmw2, eteo, konradr, lwoodman, roland, triage, wtogami | 
| Target Milestone: | --- | Keywords: | Reopened, Security | 
| Target Release: | --- | ||
| Hardware: | All | ||
| OS: | Linux | ||
| Whiteboard: | bzcl34nup | ||
| Fixed In Version: | Doc Type: | Bug Fix | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2008-05-06 16:21:57 UTC | Type: | --- | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 209118, 213554, 216451, 218108, 454237 | ||
| Attachments: | |||
| 
        
          Description
        
        
          Jan Kratochvil
        
        
        
        
        
          2006-09-18 18:00:00 UTC
        
       Created attachment 136560 [details]
oops log - Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b
Created attachment 136561 [details] Enable full debugging, should be obsoleted by Bug 205179 Crash again during gdb testsuite: Running ../.././gdb/testsuite/gdb.base/checkpoint.exp ... occured for me today again - but on x86_64. So it is not architecture specific. I do not have the crash dump, though. BTW this time it was stock RawHide: kernel-2.6.18-1.2689.fc6.x86_64 And so far it appears to me the crash is on gdb/testsuite/gdb.base/chng-syms.exp - recompiling of the executable binary while it is still running (and being debugged by gdb). Looks like a use-after-free. Note that current rawhide has slab debugging disabled, so it won't show up in this manner any more (it may even hide the crash). Running this now with the latest nightly (20061012) + SLAB debug on. Just to confirm: This is arch independent. I am running on an 4p x86_64. 1. Build & install kernel + SLAB debug. 2. Build & install gdb 3. Run gdb/testsuite/gdb.base/chng-syms.exp or gdb/testsuite/gdb.base/checkpoint.exp P. In my case several of the tests in gdb/testsuite seem to be causing slab corruption... P. FYI, I am seeing more than just slab corruption.
On my 4 way ia64 box running 2.6.18-1.2726.el5 the test just fails:
Running ../../../gdb/testsuite/gdb.base/checkpoint.exp ...
ERROR: internal buffer is full.
(then it continues with the suite)
However, more interesting is on kona (24 cpu ia64) I get a panic, this is also
2.6.18-1.2726.el5:
Unable to handle kernel paging request at virtual address 6b6b6b6b6b6b6b6b
ksoftirqd/7[24]: Oops 11012296146944 [1]
Modules linked in: autofs4 hidp rfcomm l2cap bluetooth sunrpc ipv6 vfat fat
dm_multipath button parport_pc lp parport sr_mod cdrom sg tg3 dm_snapshot
dm_zero dm_mirror dm_mod mptspi mptscsih mptbase scsi_transport_spi sd_mod
scsi_mod ext3 jbd ehci_hcd ohci_hcd uhci_hcd
Pid: 24, CPU 7, comm:          ksoftirqd/7
psr : 0000121008526010 ifs : 8000000000000389 ip  : [<a0000001000a8591>]    Not
tainted
ip is at __rcu_process_callbacks+0x351/0x5a0
unat: 0000000000000000 pfs : 0000000000000389 rsc : 0000000000000003
rnat: 0000000000000000 bsps: 0000000000000000 pr  : 0000000000006581
ldrs: 0000000000000000 ccv : 0000000000000003 fpsr: 0009804c8a70433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001000a85b0 b6  : a0000001000f03a0 b7  : a00000010000bf00
f6  : 1003e6b6b6b6b6b6b6b6b f7  : 1003e0000000000000050
f8  : 1003e000000000000004c f9  : 10005a000000000000000
f10 : 1000ab951c71c67939fd3 f11 : 1003e0000000000000b95
r1  : a000000100c15040 r2  : 000000000000001b r3  : 000000000000000a
r8  : 0000000000000006 r9  : e000070400076b08 r10 : 6b6b6b6b6b6b6b6b
r11 : 0000000000000020 r12 : e00000010e50fd50 r13 : e00000010e508000
r14 : e00007078c969b08 r15 : e000070400076af8 r16 : e0000787fae95174
r17 : 0000000000000006 r18 : 000000000000001b r19 : e00000010f054440
r20 : 00000000000040c0 r21 : 0000000000000010 r22 : e0000787fae95270
r23 : e0000787fae95250 r24 : 000000000000001b r25 : 0000000000004000
r26 : 0000000000004000 r27 : e00006fcfff98000 r28 : e00006fcfff98001
r29 : 000000001c1e3259 r30 : e000070403f94000 r31 : 0000000000001c1e
Call Trace:
 [<a000000100014140>] show_stack+0x40/0xa0
                                sp=e00000010e50f8e0 bsp=e00000010e5093d8
 [<a000000100014a40>] show_regs+0x840/0x880
                                sp=e00000010e50fab0 bsp=e00000010e509380
 [<a000000100037e20>] die+0x1c0/0x2a0
                                sp=e00000010e50fab0 bsp=e00000010e509338
 [<a00000010062aa20>] ia64_do_page_fault+0x8a0/0x9e0
                                sp=e00000010e50fad0 bsp=e00000010e5092e8
 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
                                sp=e00000010e50fb80 bsp=e00000010e5092e8
 [<a0000001000a8590>] __rcu_process_callbacks+0x350/0x5a0
                                sp=e00000010e50fd50 bsp=e00000010e5092a0
 [<a0000001000a8820>] rcu_process_callbacks+0x40/0xa0
                                sp=e00000010e50fd50 bsp=e00000010e509280
 [<a000000100087c30>] tasklet_action+0x1d0/0x340
                                sp=e00000010e50fd50 bsp=e00000010e509258
 [<a000000100086e10>] __do_softirq+0xf0/0x240
                                sp=e00000010e50fd50 bsp=e00000010e5091d8
 [<a000000100086fd0>] do_softirq+0x70/0xc0
                                sp=e00000010e50fd50 bsp=e00000010e509178
 [<a000000100087710>] ksoftirqd+0x110/0x280
                                sp=e00000010e50fd50 bsp=e00000010e509150
 [<a0000001000aeae0>] kthread+0x220/0x2a0
                                sp=e00000010e50fd50 bsp=e00000010e509108
 [<a0000001000126b0>] kernel_thread_helper+0x30/0x60
                                sp=e00000010e50fe30 bsp=e00000010e5090e0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e00000010e50fe30 bsp=e00000010e5090e0
 <3>BUG: sleeping function called from invalid context at kernel/rwsem.c:20
in_atomic():1, irqs_disabled():0
Call Trace:
 [<a000000100014140>] show_stack+0x40/0xa0
                                sp=e00000010e50f8c0 bsp=e00000010e509498
 [<a0000001000141d0>] dump_stack+0x30/0x60
                                sp=e00000010e50fa90 bsp=e00000010e509480
 [<a000000100066270>] __might_sleep+0x1b0/0x1e0
                                sp=e00000010e50fa90 bsp=e00000010e509458
 [<a0000001000b6060>] down_read+0x20/0x60
                                sp=e00000010e50fa90 bsp=e00000010e509438
 [<a00000010009fc40>] blocking_notifier_call_chain+0x20/0x80
                                sp=e00000010e50fa90 bsp=e00000010e509400
 [<a00000010007c550>] profile_task_exit+0x30/0x60
                                sp=e00000010e50fa90 bsp=e00000010e5093d8
 [<a0000001000808c0>] do_exit+0x40/0x1460
                                sp=e00000010e50fa90 bsp=e00000010e509380
 [<a000000100037ee0>] die+0x280/0x2a0
                                sp=e00000010e50fab0 bsp=e00000010e509338
 [<a00000010062aa20>] ia64_do_page_fault+0x8a0/0x9e0
                                sp=e00000010e50fad0 bsp=e00000010e5092e8
 [<a00000010000c700>] __ia64_leave_kernel+0x0/0x280
                                sp=e00000010e50fb80 bsp=e00000010e5092e8
 [<a0000001000a8590>] __rcu_process_callbacks+0x350/0x5a0
                                sp=e00000010e50fd50 bsp=e00000010e5092a0
 [<a0000001000a8820>] rcu_process_callbacks+0x40/0xa0
                                sp=e00000010e50fd50 bsp=e00000010e509280
 [<a000000100087c30>] tasklet_action+0x1d0/0x340
                                sp=e00000010e50fd50 bsp=e00000010e509258
 [<a000000100086e10>] __do_softirq+0xf0/0x240
                                sp=e00000010e50fd50 bsp=e00000010e5091d8
 [<a000000100086fd0>] do_softirq+0x70/0xc0
                                sp=e00000010e50fd50 bsp=e00000010e509178
 [<a000000100087710>] ksoftirqd+0x110/0x280
                                sp=e00000010e50fd50 bsp=e00000010e509150
 [<a0000001000aeae0>] kthread+0x220/0x2a0
                                sp=e00000010e50fd50 bsp=e00000010e509108
 [<a0000001000126b0>] kernel_thread_helper+0x30/0x60
                                sp=e00000010e50fe30 bsp=e00000010e5090e0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e00000010e50fe30 bsp=e00000010e5090e0
Kernel panic - not syncing: Aiee, killing interrupt handler!
I cannot do any kernel debugging on ia64 myself, so it is most helpful to concentrate on reproducing the problem on x86_64, i386, or ppc64. *** This bug has been marked as a duplicate of 210249 *** Jan, can you verify that the problem persists on FC6? I have a fix for a problem that I think could have this failure mode. I would like to verify a specific kernel where you see the problem reliably, and then have you try a kernel with no differences but my fix. If the problem is masked by having CONFIG_DEBUG_SLAB disabled, we can try a rebuild of a current FC6 kernel with just those config options changed (kernel-2.6.18-1.2798.fc6 or kernel-2.6.18-1.2814.fc6). Roland, (In reply to comment #14) > Jan, can you verify that the problem persists on FC6? > I have a fix for a problem that I think could have this failure mode. > I would like to verify a specific kernel where you see the problem reliably, and > then have you try a kernel with no differences but my fix. Could you attach the fix please? I'm also looking at 210706 which is very similar to this BZ ... P. > If the problem is masked by having CONFIG_DEBUG_SLAB disabled, we can try a > rebuild of a current FC6 kernel with just those config options changed > (kernel-2.6.18-1.2798.fc6 or kernel-2.6.18-1.2814.fc6). > Jan, can you verify that the problem persists on FC6? Roland, I just verified that I can see this issue on the FC6 kernel on kona1, a 24-way ia64 box. > I have a fix for a problem that I think could have this failure mode. > I would like to verify a specific kernel where you see the problem reliably, and > then have you try a kernel with no differences but my fix. I can install the FC6 kernel, kernel-2.6.18-1.2798, and installing gdb sources and running the following script from /usr/src/redhat/BUILD/gdb-6.5/gdb/testsuite while true; do sleep 1; runtest gdb.base/checkpoint.exp gdb.base/chng-syms.exp; done (Warning: It can take up to 3 hours for the problem to occur...) P. kernel-2.6.18-1.2798.rm2.fc6.x86_64 booted for me but it crashed for the gdb testsuite on the 7th run. As Prarit noted the missing tty locking changes I did not try to get a kdump. Serial console unfortunately not available there. Roland, With your patch I hit a new issue: kona1.lab.boston.redhat.com login: [ 342.380072] kernel BUG at kernel/utrace.c:947! [ 342.380302] checkpoint[4528]: bugcheck! 0 [1] [ 342.380496] Modules linked in: hidp(U) nfs(U) lockd(U) fscache(U) nfs_acl(U) rfcomm(U) l2cap(U) bluetooth(U) sunrpc(U) ipv6(U) vfat(U) fat(U) dm_multipath(U) button(U) parport_pc(U) lp(U) parport(U) sr_mod(U) cdrom(U) sg(U) tg3(U) dm_snapshot(U) dm_zero(U) dm_mirror(U) dm_mod(U) mptspi(U) mptscsih(U) mptbase(U) scsi_transport_spi(U) sd_mod(U) scsi_mod(U) ext3(U) jbd(U) ehci_hcd(U) ohci_hcd(U) uhci_hcd(U) [ 342.382158] [ 342.382160] Pid: 4528, CPU 8, comm: checkpoint [ 342.382367] psr : 00001010085a6010 ifs : 800000000000040b ip : [<a0000001000ee980>] Not tainted [ 342.382644] ip is at utrace_quiescent+0x2c0/0x6e0 [ 342.382785] unat: 0000000000000000 pfs : 000000000000040b rsc : 0000000000000003 [ 342.383002] rnat: 0000000000000000 bsps: 0000000000000000 pr : 000000000059955b [ 342.383221] ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70433f [ 342.383452] csd : 0000000000000000 ssd : 0000000000000000 [ 342.383689] b0 : a0000001000ee980 b6 : a000000100011300 b7 : a00000010000bf20 [ 342.383994] f6 : 1003e00000000000000a0 f7 : 1003e20c49ba5e353f7cf [ 342.384256] f8 : 1003e00000000000004e2 f9 : 1003e000000000fa00000 [ 342.384502] f10 : 1003e000000003b9aca00 f11 : 1003e431bde82d7b634db [ 342.384755] r1 : a000000100c03c90 r2 : a000000100a1b4c0 r3 : e000070781a19084 [ 342.385403] r8 : 0000000000000035 r9 : a000000100a197b8 r10 : a000000100a1b4f0 [ 342.385729] r11 : a000000100a1b4f0 r12 : e000070781a1fd50 r13 : e000070781a18000 [ 342.386008] r14 : a000000100a1b4c0 r15 : 0000000000000000 r16 : ffffffffdead4ead [ 342.386635] r17 : 00000000dead4ead r18 : a000000100949afc r19 : a000000100a197b0 [ 342.387400] r20 : 0000000000000000 r21 : a000000100a04310 r22 : 0000000000000004 [ 342.388126] r23 : a000000100855180 r24 : a000000100a04310 r25 : a000000100a1b4c8 [ 342.388814] r26 : a000000100a1b4c8 r27 : e0000707f92d9070 r28 : e0000707f92d8008 [ 342.389430] r29 : e0000787f30a0060 r30 : e0000707f92d802c r31 : e0000787f30a002c [ 342.390168] [ 342.390169] Call Trace: [ 342.390614] [<a000000100013ea0>] show_stack+0x40/0xa0 [ 342.390617] sp=e000070781a1f8e0 bsp=e000070781a194c0 [ 342.391863] [<a0000001000147a0>] show_regs+0x840/0x880 [ 342.391866] sp=e000070781a1fab0 bsp=e000070781a19468 [ 342.415578] [<a000000100037b80>] die+0x1c0/0x2a0 [ 342.415580] sp=e000070781a1fab0 bsp=e000070781a19420 [ 342.416702] [<a000000100037cb0>] die_if_kernel+0x50/0x80 [ 342.416705] sp=e000070781a1fad0 bsp=e000070781a193e8 [ 342.417162] [<a00000010061dc50>] ia64_bad_break+0x270/0x4a0 [ 342.417164] sp=e000070781a1fad0 bsp=e000070781a193c0 [ 342.417579] [<a00000010000c720>] __ia64_leave_kernel+0x0/0x280 [ 342.417581] sp=e000070781a1fb80 bsp=e000070781a193c0 [ 342.417993] [<a0000001000ee980>] utrace_quiescent+0x2c0/0x6e0 [ 342.417995] sp=e000070781a1fd50 bsp=e000070781a19368 [ 342.418399] [<a0000001000f1140>] utrace_get_signal+0xa60/0xac0 [ 342.418402] sp=e000070781a1fd50 bsp=e000070781a19308 [ 342.439207] [<a00000010009c600>] get_signal_to_deliver+0x1e0/0x740 [ 342.439210] sp=e000070781a1fd80 bsp=e000070781a192b8 [ 342.439636] [<a000000100034650>] ia64_do_signal+0x90/0xde0 [ 342.439638] sp=e000070781a1fd80 bsp=e000070781a191d0 [ 342.440487] [<a000000100013d20>] do_notify_resume_user+0x100/0x160 [ 342.440489] sp=e000070781a1fe20 bsp=e000070781a191a0 [ 342.440905] [<a00000010000cc00>] notify_resume_user+0x40/0x60 [ 342.440908] sp=e000070781a1fe20 bsp=e000070781a19150 [ 342.441319] [<a00000010000cb30>] skip_rbs_switch+0xe0/0x110 [ 342.441321] sp=e000070781a1fe30 bsp=e000070781a19150 [ 342.441740] [<a000000000010620>] __start_ivt_text+0xffffffff00010620/0x400 [ 342.441743] sp=e000070781a20000 bsp=e000070781a19150 [ 342.442172] Created attachment 139632 [details] kernel-2.6.18-1.2798.rm3.fc6.ia64 oops log kernel-2.6.18-1.2798.rm3.fc6.x86_64: rpmbuild --rebuild gdb-6.5-11.el5.src.rpm crash on the 2nd run (no serial console there, kdump not loadable) kernel-2.6.18-1.2798.rm3.fc6.ia64: rpmbuild --rebuild gdb-6.5-11.el5.src.rpm crash on the 2nd run It looks as the fdtable crash in Bug 210706 Comment 16: [Full dump attached.] [<a000000100144900>] free_block+0x100/0x300 sp=e00007307bbf7960 bsp=e00007307bbf17b8 [<a000000100145040>] __drain_alien_cache+0xc0/0x100 sp=e00007307bbf7960 bsp=e00007307bbf1778 [<a0000001001443d0>] kmem_cache_free+0x390/0x580 sp=e00007307bbf7960 bsp=e00007307bbf1738 [<a000000100196250>] free_fdtable_rcu+0xb0/0x2a0 sp=e00007307bbf7970 bsp=e00007307bbf1708 > > kernel-2.6.18-1.2798.rm3.fc6.ia64: > rpmbuild --rebuild gdb-6.5-11.el5.src.rpm crash on the 2nd run > It looks as the fdtable crash in Bug 210706 Comment 16: Still seeing a fdtable panic with Roland's latest patch ... I'll look into this further in the AM. > [Full dump attached.] > [<a000000100144900>] free_block+0x100/0x300 > sp=e00007307bbf7960 bsp=e00007307bbf17b8 > [<a000000100145040>] __drain_alien_cache+0xc0/0x100 > sp=e00007307bbf7960 bsp=e00007307bbf1778 > [<a0000001001443d0>] kmem_cache_free+0x390/0x580 > sp=e00007307bbf7960 bsp=e00007307bbf1738 > [<a000000100196250>] free_fdtable_rcu+0xb0/0x2a0 > sp=e00007307bbf7970 bsp=e00007307bbf1708 > Hey Prarit, Any update on the fdtable? Is there any progress on the issue We saw the following oops on rhel5 utrace code
BUG: unable to handle kernel paging request at virtual address 7ca1c291
EIP is at utrace_get_signal+0x46/0x477
          get_signal_to_deliver+0xdf/0x3b1
          do_notify_resume+0xa9/0x6a5
          audit_syscall_exit+0x285/0x2a1
          work_notifysig+0x13/0x19
          copy_to_user_policy+0x73/0x7f
The failing IP corresponds to code in utrace_get_signal():
int
utrace_get_signal(struct task_struct *tsk, struct pt_regs *regs,
                  siginfo_t *info, struct k_sigaction *return_ka)
{
        struct utrace *utrace = tsk->utrace;
                ...
        if (utrace->u.live.signal != NULL) {
                signal.signr = utrace->u.live.signal->signr;
                copy_siginfo(info, utrace->u.live.signal->info);
                                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^
Bogus pointer was supplied here.
How ->utrace assignment should be handled correctly?
I'm sorry to bother you guys again, but are these utrace LOCALLY *EXPLOITABLE* issues going to be fixed? Created attachment 159845 [details]
utrace_get_signal() oops reproducer
Comment 31, Comment 32 and Comment 33 have been moved into a new Bug 312951 as it is a different problem. Comment on attachment 159845 [details] utrace_get_signal() oops reproducer Moved into Bug 312951. Fedora apologizes that these issues have not been resolved yet. We're sorry it's taken so long for your bug to be properly triaged and acted on. We appreciate the time you took to report this issue and want to make sure no important bugs slip through the cracks. If you're currently running a version of Fedora Core between 1 and 6, please note that Fedora no longer maintains these releases. We strongly encourage you to upgrade to a current Fedora release. In order to refocus our efforts as a project we are flagging all of the open bugs for releases which are no longer maintained and closing them. http://fedoraproject.org/wiki/LifeCycle/EOL If this bug is still open against Fedora Core 1 through 6, thirty days from now, it will be closed 'WONTFIX'. If you can reporduce this bug in the latest Fedora version, please change to the respective version. If you are unable to do this, please add a comment to this bug requesting the change. Thanks for your help, and we apologize again that we haven't handled these issues to this point. The process we are following is outlined here: http://fedoraproject.org/wiki/BugZappers/F9CleanUp We will be following the process here: http://fedoraproject.org/wiki/BugZappers/HouseKeeping to ensure this doesn't happen again. And if you'd like to join the bug triage team to help make things better, check out http://fedoraproject.org/wiki/BugZappers This bug is open for a Fedora version that is no longer maintained and will not be fixed by Fedora. Therefore we are closing this bug. If you can reproduce this bug against a currently maintained version of Fedora please feel free to reopen thus bug against that version. Thank you for reporting this bug and we are sorry it could not be fixed. |