Created attachment 313480 [details] Script to cause crash Description of problem: When systemtap attempts to remove kprobe on exit from a script. There is a kernel panic. Version-Release number of selected component (if applicable): systemtap-0.6.2-1.el5 kernel-2.6.18-92.1.10.el5 How reproducible: Always Steps to Reproduce: 1. Install kernel-debuginfo, kernel-devel, systemtap on ia64 machine 2. Run the attached functioncallcount.stp script with: stap -v functioncallcount.stp "*@mm/*.c" 3. Hit control-c to exit the script and see traceback on console. Actual results: Machine dies with the following traceback when systemtap attempts to remove the kprobes. squidward.rdu.redhat.com login: Unable to handle kernel NULL pointer dereference (address 00000000000001e0) systemtap/0[2675]: Oops 11012296146944 [1] Modules linked in: stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api vfat fat dm_multipath button parport_pc lp parport joydev fm801_gp sg gameport e1000 snd_fm801 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_tea575x_tuner videodev v4l1_compat v4l2_common snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device snd ide_cd soundcore cdrom dm_snapshot dm_zero dm_mirror dm_mod sym53c8xx scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd Pid: 2675, CPU 0, comm: systemtap/0 psr : 00001210085a6010 ifs : 8000000000000286 ip : [<a000000100054481>] Tainted: G ip is at module_free+0x21/0xc0 unat: 0000000000000000 pfs : 0000000000000287 rsc : 0000000000000003 rnat: 0000000000001541 bsps: a000000100638ac0 pr : 0000000000006681 ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a74433f csd : 0000000000000000 ssd : 0000000000000000 b0 : a000000100639020 b6 : a000000100014a80 b7 : a00000010000b840 f6 : 0fffefffffffff0000000 f7 : 0fff0b6f6900000000000 f8 : 10014b6f6900000000000 f9 : 0ffff8000000000000000 f10 : 10014b6f68ffff4909700 f11 : 1003e00000000002dbda4 r1 : a000000100be0370 r2 : e000000038e12c17 r3 : e000000038e12bff r8 : 0000000000000001 r9 : 00000000000003ff r10 : 0000000000003ff0 r11 : e000000038e12000 r12 : e000000035f27d00 r13 : e000000035f20000 r14 : 0000000000000170 r15 : e000000038e12808 r16 : a0000001009f8878 r17 : 0000000000200200 r18 : 0000000000100100 r19 : e00000405eb85808 r20 : e000000038e12000 r21 : e00000405fdcb7c8 r22 : 0000000000000069 r23 : 0000000000000068 r24 : 0000000000000080 r25 : 0000000000024000 r26 : 0000000000000000 r27 : 0000000000004000 r28 : 0000000000004000 r29 : 0000000000005908 r30 : e0000040440b5b01 r31 : 00000000000065c0 Call Trace: [<a000000100013b00>] show_stack+0x40/0xa0 sp=e000000035f27890 bsp=e000000035f213a8 [<a000000100014400>] show_regs+0x840/0x880 sp=e000000035f27a60 bsp=e000000035f21350 [<a000000100037be0>] die+0x1c0/0x2c0 sp=e000000035f27a60 bsp=e000000035f21308 [<a000000100637310>] ia64_do_page_fault+0x910/0xa40 sp=e000000035f27a80 bsp=e000000035f212b8 [<a00000010000c040>] __ia64_leave_kernel+0x0/0x280 sp=e000000035f27b30 bsp=e000000035f212b8 [<a000000100054480>] module_free+0x20/0xc0 sp=e000000035f27d00 bsp=e000000035f21288 [<a000000100639020>] free_insn_slot+0x160/0x220 sp=e000000035f27d00 bsp=e000000035f21260 [<a000000100635710>] arch_remove_kprobe+0x50/0x80 sp=e000000035f27d00 bsp=e000000035f21238 [<a000000100638ad0>] unregister_kprobe+0x390/0x580 sp=e000000035f27d00 bsp=e000000035f21208 [<a000000201957300>] systemtap_module_exit+0x260/0x560 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993] sp=e000000035f27d00 bsp=e000000035f211d0 [<a000000201957620>] probe_exit+0x20/0x40 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993] sp=e000000035f27d00 bsp=e000000035f211b8 [<a000000201957800>] _stp_cleanup_and_exit+0x1c0/0x200 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993] sp=e000000035f27d00 bsp=e000000035f21198 [<a000000201959ae0>] _stp_work_queue+0x1c0/0x1e0 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993] sp=e000000035f27d00 bsp=e000000035f21180 [<a0000001000a3000>] run_workqueue+0x1c0/0x280 sp=e000000035f27d00 bsp=e000000035f21140 [<a0000001000a4ee0>] worker_thread+0x1a0/0x240 sp=e000000035f27d00 bsp=e000000035f21110 [<a0000001000acf10>] kthread+0x230/0x2c0 sp=e000000035f27d50 bsp=e000000035f210c8 [<a0000001000121d0>] kernel_thread_helper+0x30/0x60 sp=e000000035f27e30 bsp=e000000035f210a0 [<a0000001000090c0>] start_kernel_thread+0x20/0x40 sp=e000000035f27e30 bsp=e000000035f210a0 <0>Kernel panic - not syncing: Fatal exception Expected results: The scripts exits cleanly on ia64 as on x86_64 and i386 machine. Additional info:
Oops, it's a know bug and fixed on upstream recently... The problem is kprobes uses module_free() but it sets mod==NULL. in module_free()@arch/ia64/kernel/module.c, it tries to check mod->member and it causes kernel pagefault. Here is the fix patch. http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=740a8de0796dd12890b3c8ddcfabfcb528b78d40 Thank you,
(In reply to comment #1) > Oops, it's a know bug and fixed on upstream recently... > The problem is kprobes uses module_free() but it sets mod==NULL. > in module_free()@arch/ia64/kernel/module.c, it tries to check > mod->member and it causes kernel pagefault. > > Here is the fix patch. > http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=740a8de0796dd12890b3c8ddcfabfcb528b78d40 > > Thank you, Thanks for the information. Is this going into the RHEL5 kernel?
(In reply to comment #2) > > Here is the fix patch. > > http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=740a8de0796dd12890b3c8ddcfabfcb528b78d40 > > > > Thank you, > > Thanks for the information. Is this going into the RHEL5 kernel? I don't think so. I think we should push it to RHEL5 as a bugfix.
Created attachment 313625 [details] kernel patch to resolve problem This patch mentioned by masami resolves the problem.
Is the back port patch posted on rhkernel ? We need to fix the crash ASAP...
(In reply to comment #5) > Is the back port patch posted on rhkernel ? We need to fix the crash ASAP... Sorry for later. The patch can be applied to rhel5.2 kernel. I'm building and testing now. After that I'll post it. thank you,
To comment#6, This patch fixes a kernel crash which is always the first priority thing to me. I'm eager to make the patch in RHEL 5 as early as possible. So please feel free to assign the bug to me if you need me to review, test and post. Thanks Luming
(In reply to comment #7) > To comment#6, > This patch fixes a kernel crash which is always the first priority thing to me. > I'm eager to make the patch in RHEL 5 as early as possible. So please feel free > to assign the bug to me if you need me to review, test and post. I've posted it yesterday. please review it. Thanks,
*** Bug 459012 has been marked as a duplicate of this bug. ***
in kernel-2.6.18-107.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Tried this on the ia64 machine the original problem encountered on. The kernel-2.6.18-107.el5 resolves the problem.
Matsuya-san, Thank you for providing the packages. I ran stap command again, but I got new error. [root@localhost akig]# stap -v para-callgraph.stp sys_read '*@fs/*.c' Pass 1: parsed user script and 38 library script(s) in 270usr/10sys/281real ms. semantic error: libdwfl failure (missing kernel 2.6.18-92.1.13.el5 ia64 debuginfo): No such file or directory while resolving probe point kernel.function("sys_read").call semantic error: no match while resolving probe point kernel.function("sys_read").return semantic error: no match while resolving probe point kernel.function("*@fs/*.c").call semantic error: no match while resolving probe point kernel.function("*@fs/*.c").return Pass 2: analyzed script: 0 probe(s), 36 function(s), 1 embed(s), 3 global(s) in 11usr/0sys/10real ms. Pass 2: analysis failed. Try again with more '-v' (verbose) options. The debuginfo is installed in the following directory: /usr/lib/debug/lib/modules/2.6.18-92.1.13.el5debug Is this correct? Thanks, Akiyama Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by streeter issue 198733
Oshiro-san, I changed the directory name which contained the debuginfo as follows: 2.6.18-92.1.13.el5debug => 2.6.18-92.1.13.el5 And I ran the same command again, then the kernel panic occurred. [root@localhost akig]# stap -v para-callgraph.stp sys_read '*@fs/*.c' Pass 1: parsed user script and 38 library script(s) in 275usr/6sys/336real ms. Pass 2: analyzed script: 3178 probe(s), 9 function(s), 1 embed(s), 3 global(s) in 9 80usr/150sys/8229real ms. Pass 3: translated to C into "/tmp/stapCSv5YO/stap_52cf138fc4a0964dc2da6fe8a0e813b8 _428428.c" in 154usr/3sys/167real ms. Pass 4: compiled C into "stap_52cf138fc4a0964dc2da6fe8a0e813b8_428428.ko" in 6373us r/292sys/7199real ms. Pass 5: starting run. sendmail[4238]: IA-64 Illegal operation fault 0 [1] Modules linked in: stap_52cf138fc4a0964dc2da6fe8a0e813b8_428428(U) ipv6 xfrm_nalgo crypto_api autofs4 hidp rfcomm l2cap bluetooth sunrpc sr_mod cdrom vfat fat dm_mirr or dm_multipath dm_mod button parport_pc lp parport sg usb_storage e100 mii tg3 shp chp mptsas mptscsih mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd oh ci_hcd ehci_hcd Pid: 4238, CPU 0, comm: sendmail psr : 00001010085a6010 ifs : 8000000000000205 ip : [<a0000001001668b0>] Tainted : G ip is at __fput+0x370/0x420 unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003 rnat: e000000129edfde9 bsps: 0000000000000002 pr : 0000000000005959 ldrs: 0000000000000000 ccv : 000000000000000f fpsr: 0009804c0270033f csd : 0000000000000000 ssd : 0000000000000000 b0 : a0000001000593a0 b6 : a000000100241b80 b7 : a0000001001e6ac0 f6 : 1003e0000000000000000 f7 : 1003e0000000000000000 f8 : 0fffd8000000000000000 f9 : 1001ec800000000000000 f10 : 1003e0000000000000000 f11 : 1001cfffffffffcbd54f0 r1 : a000000100be0370 r2 : 0000000000004000 r3 : 0000000000002710 r8 : 0000000000000000 r9 : a0000001009f6c68 r10 : 0000000000000049 r11 : e000000028006ae0 r12 : e000000129edfe30 r13 : e000000129ed8000 r14 : e00000010955e648 r15 : 000000000000000f r16 : a0000001009eba28 r17 : e00000010f47f498 r18 : e00000010f47f490 r19 : e000000028010000 r20 : ffffffffffff0028 r21 : e00000012c7b2d90 r22 : e000000107c45704 r23 : e000000107c457e8 r24 : 000000000000001b r25 : 000000000000001a r26 : e000000129ed9054 r27 : 0000000000000080 r28 : 0000000000024000 r29 : 0000000000004000 r30 : 0000000000004000 r31 : e000000103b90000 Call Trace: [<a000000100013b00>] show_stack+0x40/0xa0 sp=e000000129edf930 bsp=e000000129ed9308 [<a000000100014400>] show_regs+0x840/0x880 sp=e000000129edfb00 bsp=e000000129ed92b0 [<a000000100037be0>] die+0x1c0/0x2c0 sp=e000000129edfb00 bsp=e000000129ed9268 [<a000000100037d30>] die_if_kernel+0x50/0x80 sp=e000000129edfb20 bsp=e000000129ed9238 [<a000000100037dc0>] ia64_illegal_op_fault+0x60/0x180 sp=e000000129edfb20 bsp=e000000129ed91e8 [<a000000100003f20>] dispatch_illegal_op_fault+0x300/0x800 sp=e000000129edfc60 bsp=e000000129ed91e8 [<a0000001001668b0>] __fput+0x370/0x420 sp=e000000129edfe30 bsp=e000000129ed91c0 <0>Kernel panic - not syncing: Fatal exception Thanks, Akiyama This event sent from IssueTracker by streeter issue 198733
Hi, (In reply to comment #19) > The debuginfo is installed in the following directory: > > /usr/lib/debug/lib/modules/2.6.18-92.1.13.el5debug > > Is this correct? No, I think you might have installed kernel-debug-debuginfo package. ^^^^^ Please don't change directory name. Instead of that, could you install kernel-debuginfo package and test it again? ^^^^^^^^^^^^^^^^ Thank you,
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html