Bug 457961 - kprobes remove causing kernel panic on ia64 with 2.6.18-92.1.10.el5 kernel
Summary: kprobes remove causing kernel panic on ia64 with 2.6.18-92.1.10.el5 kernel
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.2
Hardware: ia64
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Masami Hiramatsu
QA Contact: Martin Jenner
URL:
Whiteboard:
: 459012 (view as bug list)
Depends On:
Blocks: 329781
TreeView+ depends on / blocked
 
Reported: 2008-08-05 19:26 UTC by William Cohen
Modified: 2018-10-20 03:09 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-01-20 19:39:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Script to cause crash (472 bytes, text/plain)
2008-08-05 19:26 UTC, William Cohen
no flags Details
kernel patch to resolve problem (1.65 KB, patch)
2008-08-06 18:40 UTC, William Cohen
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:0225 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.3 kernel security and bug fix update 2009-01-20 16:06:24 UTC

Description William Cohen 2008-08-05 19:26:16 UTC
Created attachment 313480 [details]
Script to cause crash

Description of problem:

When systemtap attempts to remove kprobe on exit from a script. There
is a kernel panic.

Version-Release number of selected component (if applicable):

systemtap-0.6.2-1.el5
kernel-2.6.18-92.1.10.el5

How reproducible:

Always


Steps to Reproduce:
1. Install kernel-debuginfo, kernel-devel, systemtap on ia64 machine
2. Run the attached functioncallcount.stp script with:
stap -v functioncallcount.stp "*@mm/*.c"


3. Hit control-c to exit the script and see traceback on console.
  
Actual results:

Machine dies with the following traceback when systemtap attempts to remove the kprobes.

squidward.rdu.redhat.com login: Unable to handle kernel NULL pointer dereference (address 00000000000001e0)
systemtap/0[2675]: Oops 11012296146944 [1]
Modules linked in: stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993(U) autofs4 hidp rfcomm l2cap bluetooth sunrpc ip_conntrack_netbios_ns ipt_REJECT xt_state ip_conntrack nfnetlink iptable_filter ip_tables ip6t_REJECT xt_tcpudp ip6table_filter ip6_tables x_tables ipv6 xfrm_nalgo crypto_api vfat fat dm_multipath button parport_pc lp parport joydev fm801_gp sg gameport e1000 snd_fm801 snd_ac97_codec ac97_bus snd_seq_dummy snd_seq_oss snd_seq_midi_event snd_seq snd_pcm_oss snd_mixer_oss snd_pcm snd_page_alloc snd_tea575x_tuner videodev v4l1_compat v4l2_common snd_opl3_lib snd_timer snd_hwdep snd_mpu401_uart snd_rawmidi snd_seq_device snd ide_cd soundcore cdrom dm_snapshot dm_zero dm_mirror dm_mod sym53c8xx scsi_transport_spi sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd

Pid: 2675, CPU 0, comm:          systemtap/0
psr : 00001210085a6010 ifs : 8000000000000286 ip  : [<a000000100054481>]    Tainted: G     
ip is at module_free+0x21/0xc0
unat: 0000000000000000 pfs : 0000000000000287 rsc : 0000000000000003
rnat: 0000000000001541 bsps: a000000100638ac0 pr  : 0000000000006681
ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a74433f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a000000100639020 b6  : a000000100014a80 b7  : a00000010000b840
f6  : 0fffefffffffff0000000 f7  : 0fff0b6f6900000000000
f8  : 10014b6f6900000000000 f9  : 0ffff8000000000000000
f10 : 10014b6f68ffff4909700 f11 : 1003e00000000002dbda4
r1  : a000000100be0370 r2  : e000000038e12c17 r3  : e000000038e12bff
r8  : 0000000000000001 r9  : 00000000000003ff r10 : 0000000000003ff0
r11 : e000000038e12000 r12 : e000000035f27d00 r13 : e000000035f20000
r14 : 0000000000000170 r15 : e000000038e12808 r16 : a0000001009f8878
r17 : 0000000000200200 r18 : 0000000000100100 r19 : e00000405eb85808
r20 : e000000038e12000 r21 : e00000405fdcb7c8 r22 : 0000000000000069
r23 : 0000000000000068 r24 : 0000000000000080 r25 : 0000000000024000
r26 : 0000000000000000 r27 : 0000000000004000 r28 : 0000000000004000
r29 : 0000000000005908 r30 : e0000040440b5b01 r31 : 00000000000065c0

Call Trace:
 [<a000000100013b00>] show_stack+0x40/0xa0
                                sp=e000000035f27890 bsp=e000000035f213a8
 [<a000000100014400>] show_regs+0x840/0x880
                                sp=e000000035f27a60 bsp=e000000035f21350
 [<a000000100037be0>] die+0x1c0/0x2c0
                                sp=e000000035f27a60 bsp=e000000035f21308
 [<a000000100637310>] ia64_do_page_fault+0x910/0xa40
                                sp=e000000035f27a80 bsp=e000000035f212b8
 [<a00000010000c040>] __ia64_leave_kernel+0x0/0x280
                                sp=e000000035f27b30 bsp=e000000035f212b8
 [<a000000100054480>] module_free+0x20/0xc0
                                sp=e000000035f27d00 bsp=e000000035f21288
 [<a000000100639020>] free_insn_slot+0x160/0x220
                                sp=e000000035f27d00 bsp=e000000035f21260
 [<a000000100635710>] arch_remove_kprobe+0x50/0x80
                                sp=e000000035f27d00 bsp=e000000035f21238
 [<a000000100638ad0>] unregister_kprobe+0x390/0x580
                                sp=e000000035f27d00 bsp=e000000035f21208
 [<a000000201957300>] systemtap_module_exit+0x260/0x560 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993]
                                sp=e000000035f27d00 bsp=e000000035f211d0
 [<a000000201957620>] probe_exit+0x20/0x40 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993]
                                sp=e000000035f27d00 bsp=e000000035f211b8
 [<a000000201957800>] _stp_cleanup_and_exit+0x1c0/0x200 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993]
                                sp=e000000035f27d00 bsp=e000000035f21198
 [<a000000201959ae0>] _stp_work_queue+0x1c0/0x1e0 [stap_e8e3a8fbb55260e0e76107b0b8d14e31_433993]
                                sp=e000000035f27d00 bsp=e000000035f21180
 [<a0000001000a3000>] run_workqueue+0x1c0/0x280
                                sp=e000000035f27d00 bsp=e000000035f21140
 [<a0000001000a4ee0>] worker_thread+0x1a0/0x240
                                sp=e000000035f27d00 bsp=e000000035f21110
 [<a0000001000acf10>] kthread+0x230/0x2c0
                                sp=e000000035f27d50 bsp=e000000035f210c8
 [<a0000001000121d0>] kernel_thread_helper+0x30/0x60
                                sp=e000000035f27e30 bsp=e000000035f210a0
 [<a0000001000090c0>] start_kernel_thread+0x20/0x40
                                sp=e000000035f27e30 bsp=e000000035f210a0
 <0>Kernel panic - not syncing: Fatal exception



Expected results:

The scripts exits cleanly on ia64 as on x86_64 and i386 machine.


Additional info:

Comment 1 Masami Hiramatsu 2008-08-05 23:03:31 UTC
Oops, it's a know bug and fixed on upstream recently...
The problem is kprobes uses module_free() but it sets mod==NULL.
in module_free()@arch/ia64/kernel/module.c, it tries to check
mod->member and it causes kernel pagefault.

Here is the fix patch.
http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=740a8de0796dd12890b3c8ddcfabfcb528b78d40

Thank you,

Comment 2 William Cohen 2008-08-06 11:39:50 UTC
(In reply to comment #1)
> Oops, it's a know bug and fixed on upstream recently...
> The problem is kprobes uses module_free() but it sets mod==NULL.
> in module_free()@arch/ia64/kernel/module.c, it tries to check
> mod->member and it causes kernel pagefault.
> 
> Here is the fix patch.
> http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=740a8de0796dd12890b3c8ddcfabfcb528b78d40
> 
> Thank you,

Thanks for the information. Is this going into the RHEL5 kernel?

Comment 3 Masami Hiramatsu 2008-08-06 12:12:46 UTC
(In reply to comment #2)
> > Here is the fix patch.
> > http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=740a8de0796dd12890b3c8ddcfabfcb528b78d40
> > 
> > Thank you,
> 
> Thanks for the information. Is this going into the RHEL5 kernel?

I don't think so. I think we should push it to RHEL5 as a bugfix.

Comment 4 William Cohen 2008-08-06 18:40:20 UTC
Created attachment 313625 [details]
kernel patch to resolve problem

This patch mentioned by masami resolves the problem.

Comment 5 Luming Yu 2008-08-11 09:07:37 UTC
Is the back port patch posted on rhkernel ? We need to fix the crash ASAP...

Comment 6 Masami Hiramatsu 2008-08-11 20:07:48 UTC
(In reply to comment #5)
> Is the back port patch posted on rhkernel ? We need to fix the crash ASAP...

Sorry for later. The patch can be applied to rhel5.2 kernel. I'm building and testing now.
After that I'll post it. thank you,

Comment 7 Luming Yu 2008-08-12 15:09:23 UTC
To comment#6,
This patch fixes a kernel crash which is always the first priority thing to me.
I'm eager to make the patch in RHEL 5 as early as possible. So please feel free to assign the bug to me if you need me to review, test and post.

Thanks 
Luming

Comment 8 Masami Hiramatsu 2008-08-13 15:03:00 UTC
(In reply to comment #7)
> To comment#6,
> This patch fixes a kernel crash which is always the first priority thing to me.
> I'm eager to make the patch in RHEL 5 as early as possible. So please feel free
> to assign the bug to me if you need me to review, test and post.

I've posted it yesterday. please review it.

Thanks,

Comment 10 Prarit Bhargava 2008-08-14 12:36:20 UTC
*** Bug 459012 has been marked as a duplicate of this bug. ***

Comment 15 Don Zickus 2008-09-03 03:41:15 UTC
in kernel-2.6.18-107.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 16 William Cohen 2008-09-03 14:45:56 UTC
Tried this on the ia64 machine the original problem encountered on. The kernel-2.6.18-107.el5 resolves the problem.

Comment 19 Issue Tracker 2008-09-18 14:56:26 UTC
Matsuya-san,

Thank you for providing the packages.
I ran stap command again, but I got new error.

[root@localhost akig]# stap -v para-callgraph.stp sys_read '*@fs/*.c'
Pass 1: parsed user script and 38 library script(s) in
270usr/10sys/281real ms.
semantic error: libdwfl failure (missing kernel 2.6.18-92.1.13.el5 ia64
debuginfo): No such file or directory while resolving probe point
kernel.function("sys_read").call
semantic error: no match while resolving probe point
kernel.function("sys_read").return
semantic error: no match while resolving probe point
kernel.function("*@fs/*.c").call
semantic error: no match while resolving probe point
kernel.function("*@fs/*.c").return
Pass 2: analyzed script: 0 probe(s), 36 function(s), 1 embed(s), 3
global(s) in 11usr/0sys/10real ms.
Pass 2: analysis failed.  Try again with more '-v' (verbose) options.

The debuginfo is installed in the following directory: 

 /usr/lib/debug/lib/modules/2.6.18-92.1.13.el5debug

Is this correct?

Thanks,
Akiyama

Internal Status set to 'Waiting on Support'
Status set to: Waiting on Tech

This event sent from IssueTracker by streeter 
 issue 198733

Comment 20 Issue Tracker 2008-09-18 14:56:29 UTC
Oshiro-san,

I changed the directory name which contained the debuginfo as follows:

  2.6.18-92.1.13.el5debug => 2.6.18-92.1.13.el5

And I ran the same command again, then the kernel panic occurred.

[root@localhost akig]# stap -v para-callgraph.stp sys_read '*@fs/*.c'
Pass 1: parsed user script and 38 library script(s) in 275usr/6sys/336real
ms.
Pass 2: analyzed script: 3178 probe(s), 9 function(s), 1 embed(s), 3
global(s) in 9                           80usr/150sys/8229real ms.
Pass 3: translated to C into
"/tmp/stapCSv5YO/stap_52cf138fc4a0964dc2da6fe8a0e813b8                    
      _428428.c" in 154usr/3sys/167real ms.
Pass 4: compiled C into "stap_52cf138fc4a0964dc2da6fe8a0e813b8_428428.ko"
in 6373us                           r/292sys/7199real ms.
Pass 5: starting run.
sendmail[4238]: IA-64 Illegal operation fault 0 [1]
Modules linked in: stap_52cf138fc4a0964dc2da6fe8a0e813b8_428428(U) ipv6
xfrm_nalgo                            crypto_api autofs4 hidp rfcomm l2cap
bluetooth sunrpc sr_mod cdrom vfat fat dm_mirr                           or
dm_multipath dm_mod button parport_pc lp parport sg usb_storage e100 mii
tg3 shp                           chp mptsas mptscsih mptbase
scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd oh                   
       ci_hcd ehci_hcd

Pid: 4238, CPU 0, comm:             sendmail
psr : 00001010085a6010 ifs : 8000000000000205 ip  : [<a0000001001668b0>]  
 Tainted                           : G
ip is at __fput+0x370/0x420
unat: 0000000000000000 pfs : 0000000000000205 rsc : 0000000000000003
rnat: e000000129edfde9 bsps: 0000000000000002 pr  : 0000000000005959
ldrs: 0000000000000000 ccv : 000000000000000f fpsr: 0009804c0270033f
csd : 0000000000000000 ssd : 0000000000000000
b0  : a0000001000593a0 b6  : a000000100241b80 b7  : a0000001001e6ac0
f6  : 1003e0000000000000000 f7  : 1003e0000000000000000
f8  : 0fffd8000000000000000 f9  : 1001ec800000000000000
f10 : 1003e0000000000000000 f11 : 1001cfffffffffcbd54f0
r1  : a000000100be0370 r2  : 0000000000004000 r3  : 0000000000002710
r8  : 0000000000000000 r9  : a0000001009f6c68 r10 : 0000000000000049
r11 : e000000028006ae0 r12 : e000000129edfe30 r13 : e000000129ed8000
r14 : e00000010955e648 r15 : 000000000000000f r16 : a0000001009eba28
r17 : e00000010f47f498 r18 : e00000010f47f490 r19 : e000000028010000
r20 : ffffffffffff0028 r21 : e00000012c7b2d90 r22 : e000000107c45704
r23 : e000000107c457e8 r24 : 000000000000001b r25 : 000000000000001a
r26 : e000000129ed9054 r27 : 0000000000000080 r28 : 0000000000024000
r29 : 0000000000004000 r30 : 0000000000004000 r31 : e000000103b90000

Call Trace:
 [<a000000100013b00>] show_stack+0x40/0xa0
                                sp=e000000129edf930 bsp=e000000129ed9308
 [<a000000100014400>] show_regs+0x840/0x880
                                sp=e000000129edfb00 bsp=e000000129ed92b0
 [<a000000100037be0>] die+0x1c0/0x2c0
                                sp=e000000129edfb00 bsp=e000000129ed9268
 [<a000000100037d30>] die_if_kernel+0x50/0x80
                                sp=e000000129edfb20 bsp=e000000129ed9238
 [<a000000100037dc0>] ia64_illegal_op_fault+0x60/0x180
                                sp=e000000129edfb20 bsp=e000000129ed91e8
 [<a000000100003f20>] dispatch_illegal_op_fault+0x300/0x800
                                sp=e000000129edfc60 bsp=e000000129ed91e8
 [<a0000001001668b0>] __fput+0x370/0x420
                                sp=e000000129edfe30 bsp=e000000129ed91c0
 <0>Kernel panic - not syncing: Fatal exception


Thanks,
Akiyama


This event sent from IssueTracker by streeter 
 issue 198733

Comment 22 Masami Hiramatsu 2008-09-18 16:18:34 UTC
Hi,

(In reply to comment #19)
> The debuginfo is installed in the following directory: 
> 
>  /usr/lib/debug/lib/modules/2.6.18-92.1.13.el5debug
> 
> Is this correct?

No, I think you might have installed kernel-debug-debuginfo package.
                                            ^^^^^
Please don't change directory name. Instead of that, could you
install kernel-debuginfo package and test it again?
        ^^^^^^^^^^^^^^^^

Thank you,

Comment 27 errata-xmlrpc 2009-01-20 19:39:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html


Note You need to log in before you can comment on or make changes to this bug.