Bug 865184

Summary: ftrace_bug() while running systemtap testuite
Product: [Fedora] Fedora Reporter: Frank Ch. Eigler <fche>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 19CC: gansalmon, itamar, jforbes, jistone, jonathan, kernel-maint, madhu.chinakonda, masami.hiramatsu.pt, srostedt
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-04-05 19:17:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
screenshot of later stage of warnings, now turned into hangs
none
kvm serial console log of another related failure
none
Set ftrace module notifier to max none

Description Frank Ch. Eigler 2012-10-10 23:51:10 UTC
While running the git systemtap testsuite, a module-traffic-intensive section exhibits this warning, every time it's run:

[  222.445303] bid_nocheck: systemtap: 2.1/0.155, base: ffffffffa01c4000, memory: 83data/44text/1ctx/2058net/17alloc kb, probes: 1
[  231.741389] bid_shared: systemtap: 2.1/0.155, base: ffffffffa01e5000, memory: 83data/44text/1ctx/2058net/17alloc kb, probes: 1
[  232.527221] bid_shared: systemtap: 2.1/0.155, base: ffffffffa01c4000, memory: 83data/44text/1ctx/2058net/17alloc kb, probes: 1
[  233.200955] bid_shared: systemtap: 2.1/0.155, base: ffffffffa01e5000, memory: 83data/44text/1ctx/2058net/17alloc kb, probes: 1
[  242.594950] stap_28f22fae6e506c0e9b9846596b55790f__2242: systemtap: 2.1/0.155, base: ffffffffa0206000, memory: 134data/36text/1ctx/2058net/34alloc kb, probes: 131
[  242.755670] ------------[ cut here ]------------
[  242.766848] WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
[  242.771651] Hardware name: Bochs
[  242.778799] Modules linked in: fat(+) stap_28f22fae6e506c0e9b9846596b55790f__2242(F) nfsv4 auth_rpcgss nfs dns_resolver fscache lockd sunrpc ppdev microcode parport_pc virtio_net i2c_piix4 parport drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
[  242.789787] Pid: 2251, comm: modprobe Tainted: GF          O 3.7.0-0.rc0.git3.1.fc19.x86_64 #1
[  242.792135] Call Trace:
[  242.794071]  [<ffffffff8106850f>] warn_slowpath_common+0x7f/0xc0
[  242.800320]  [<ffffffff81165036>] ? __probe_kernel_read+0x46/0x70
[  242.802662]  [<ffffffffa016f000>] ? 0xffffffffa016efff
[  242.804768]  [<ffffffffa016f000>] ? 0xffffffffa016efff
[  242.807988]  [<ffffffff8106856a>] warn_slowpath_null+0x1a/0x20
[  242.809460]  [<ffffffff81127c29>] ftrace_bug+0x239/0x280
[  242.810953]  [<ffffffff811280c4>] ftrace_process_locs+0x374/0x550
[  242.812381]  [<ffffffff81129d67>] ftrace_module_notify+0x47/0x50
[  242.813726]  [<ffffffff816f1bd6>] notifier_call_chain+0x66/0x150
[  242.815095]  [<ffffffff8109b9d2>] __blocking_notifier_call_chain+0x72/0xd0
[  242.816414]  [<ffffffff8109ba46>] blocking_notifier_call_chain+0x16/0x20
[  242.817683]  [<ffffffff810e50c0>] sys_init_module+0xe0/0x2340
[  242.818928]  [<ffffffff8136c0c0>] ? ddebug_proc_open+0xd0/0xd0
[  242.820175]  [<ffffffff81099e13>] ? up_write+0x23/0x40
[  242.821460]  [<ffffffff816f6969>] system_call_fastpath+0x16/0x1b
[  242.822652] ---[ end trace 222cb7399495347e ]---
[  242.823809] ftrace failed to modify [<ffffffffa016f000>] init_once+0x0/0x20 [fat]
[  242.831132]  actual: cc:3b:75:58:e1

The system stays generally up.

Comment 1 Josh Stone 2012-10-11 01:16:52 UTC
(In reply to comment #0)
> [  242.823809] ftrace failed to modify [<ffffffffa016f000>]
> init_once+0x0/0x20 [fat]
> [  242.831132]  actual: cc:3b:75:58:e1

That "cc" indicates an existing int3 breakpoint.  It could be that stap managed to get in there with a kprobe before ftrace could, so ftrace is whining. ;)  It should be prepared for this possibility though, for example perf probe could get in there too.

Comment 2 Steven Rostedt 2012-10-11 02:15:04 UTC
ftrace is very critical about code changes, and will disable itself if it does not expect something. Remember, this was part of the reason ftrace bricked e1000e cards, so I do not take this lightly.

Now Masami Hiramatsu and I worked together to get kprobes and ftrace working in a way that kprobes now uses ftrace's infrastructure when the probe is added to a ftrace nop. In other words, kprobes should never be adding a breakpoint to a ftrace nop (and neither does perf).

That said, the way ftrace updates its nops now is via a breakpoint. So that 'cc' you see could possibly be a bug in the ftrace accounting, and for some reason a breakpoint wasn't removed or was added incorrectly.

Comment 3 Frank Ch. Eigler 2012-10-11 20:48:39 UTC
Created attachment 625691 [details]
screenshot of later stage of warnings, now turned into hangs

It looks as though the kprobe-ftrace interfaces are acting up.  Steve, do you need help getting the stap testsuite running to let you reproduce this?

Comment 4 Frank Ch. Eigler 2012-10-12 22:30:41 UTC
Created attachment 626330 [details]
kvm serial console log of another related failure

Comment 5 Frank Ch. Eigler 2012-12-13 03:50:40 UTC
In rawhide kernel 3.7.0-0.rc8.git0.1.fc19.x86_64, I'm still seeing kprobes problems: 

Dec 12 18:12:15 localhost kernel: WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280()
Dec 12 18:12:15 localhost kernel: Hardware name: Bochs
Dec 12 18:12:15 localhost kernel: Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared]
Dec 12 18:12:15 localhost kernel: Pid: 8068, comm: modprobe Tainted: GF            3.7.0-0.rc8.git0.1.fc19.x86_64 #1
Dec 12 18:12:15 localhost kernel: Call Trace:
Dec 12 18:12:15 localhost kernel:  [<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
Dec 12 18:12:15 localhost kernel:  [<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70
Dec 12 18:12:15 localhost kernel:  [<ffffffffa0180000>] ? 0xffffffffa017ffff
Dec 12 18:12:15 localhost kernel:  [<ffffffffa0180000>] ? 0xffffffffa017ffff
Dec 12 18:12:15 localhost kernel:  [<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20
Dec 12 18:12:15 localhost kernel:  [<ffffffff810fd189>] ftrace_bug+0x239/0x280
Dec 12 18:12:15 localhost kernel:  [<ffffffff810fd626>] ftrace_process_locs+0x376/0x520
Dec 12 18:12:15 localhost kernel:  [<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
Dec 12 18:12:15 localhost kernel:  [<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
Dec 12 18:12:15 localhost kernel:  [<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
Dec 12 18:12:15 localhost kernel:  [<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
Dec 12 18:12:15 localhost kernel:  [<ffffffff810c2a23>] sys_init_module+0x73/0x220
Dec 12 18:12:15 localhost kernel:  [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b
Dec 12 18:12:15 localhost kernel: ---[ end trace 9ef46351e53bbf80 ]---
Dec 12 18:12:15 localhost kernel: ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
Dec 12 18:12:15 localhost kernel:  actual: cc:bb:d2:4b:e1
Dec 12 18:12:30 localhost kernel: hrtimer: interrupt took 2706365 ns

and later many of these:

Dec 12 22:47:04 localhost kernel: WARNING: at kernel/kprobes.c:965 disarm_kprobe+0xb7/0x100()
Dec 12 22:47:04 localhost kernel: Hardware name: Bochs
Dec 12 22:47:04 localhost kernel: Failed to disarm kprobe-ftrace at ffffffff81196100 (-19)
Dec 12 22:47:04 localhost kernel: Modules linked in: stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6674(F) binfmt_misc nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6653]
Dec 12 22:47:04 localhost kernel: Pid: 6674, comm: stapio Tainted: GF       W  O 3.7.0-0.rc8.git0.1.fc19.x86_64 #1
Dec 12 22:47:04 localhost kernel: Call Trace:
Dec 12 22:47:04 localhost kernel:  [<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0
Dec 12 22:47:04 localhost kernel:  [<ffffffff8105e806>] warn_slowpath_fmt+0x46/0x50
Dec 12 22:47:04 localhost kernel:  [<ffffffff81196100>] ? sys_readv+0xc0/0xc0
Dec 12 22:47:04 localhost kernel:  [<ffffffff8163a857>] disarm_kprobe+0xb7/0x100
Dec 12 22:47:04 localhost kernel:  [<ffffffff8163a9f4>] __disable_kprobe+0x64/0x80
Dec 12 22:47:04 localhost kernel:  [<ffffffff8163aa78>] __unregister_kprobe_top+0x18/0x1d0
Dec 12 22:47:04 localhost kernel:  [<ffffffff816328fd>] ? mutex_lock+0x1d/0x50
Dec 12 22:47:04 localhost kernel:  [<ffffffff8163ac7d>] unregister_kretprobes.part.22+0x4d/0x1b0
Dec 12 22:47:04 localhost kernel:  [<ffffffff8163adf2>] unregister_kretprobes+0x12/0x20
Dec 12 22:47:04 localhost kernel:  [<ffffffffa0341db3>] _stp_cleanup_and_exit+0x1f3/0x460 [stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6674]
Dec 12 22:47:04 localhost kernel:  [<ffffffffa03423d2>] _stp_ctl_write_cmd+0x262/0xa50 [stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6674]
Dec 12 22:47:04 localhost kernel:  [<ffffffff8128888c>] ? security_file_permission+0x2c/0xb0
Dec 12 22:47:04 localhost kernel:  [<ffffffff811955fc>] vfs_write+0xac/0x180
Dec 12 22:47:04 localhost kernel:  [<ffffffff81195942>] sys_write+0x52/0xa0
Dec 12 22:47:04 localhost kernel:  [<ffffffff810628b0>] ? task_stopped_code+0x60/0x60
Dec 12 22:47:04 localhost kernel:  [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b

Comment 6 Steven Rostedt 2012-12-13 04:12:12 UTC
Dec 12 18:12:15 localhost kernel:  [<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50
Dec 12 18:12:15 localhost kernel:  [<ffffffff8163912d>] notifier_call_chain+0x4d/0x70
Dec 12 18:12:15 localhost kernel:  [<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80
Dec 12 18:12:15 localhost kernel:  [<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20
Dec 12 18:12:15 localhost kernel:  [<ffffffff810c2a23>] sys_init_module+0x73/0x220

Seems to tell us there's a problem with ftrace module accounting. I got a similar bug report on LKML last week. I'll take a look at this tomorrow.

Comment 7 Steven Rostedt 2012-12-14 02:26:59 UTC
Ah, this is a different bug than the one reported on LKML. The LKML one is an accounting error, this one is a modification error. Here's the important part:

Dec 12 18:12:15 localhost kernel: ---[ end trace 9ef46351e53bbf80 ]---
Dec 12 18:12:15 localhost kernel: ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat]
Dec 12 18:12:15 localhost kernel:  actual: cc:bb:d2:4b:e1


For some reason, it failed to modify the init_once() function of the fat module. The code that was there had a breakpoint?? cc bb d2 4b e1

Looking closer at the call trace, this was loading the module. On module load, we convert the breakpoints into nops. As this is done before the code actually runs, it doesn't use breakpoints to modify it, it just does a direct modification. Now how did this get a breakpoint on that address before it was even registered with ftrace?

Does systemtap add a kprobe on module load via a notifier? If so, we need to serialize this such that the ftrace notifier runs before other notifiers.

Comment 8 Frank Ch. Eigler 2012-12-14 02:30:47 UTC
Steven, thanks for looking into it.  Yes, a few test cases put kprobes on module-init functions, thus have to be emplaced during the module-load notifier.  (perf probe can do the same thing).

Comment 9 Steven Rostedt 2012-12-14 02:40:37 UTC
Created attachment 663310 [details]
Set ftrace module notifier to max

Currently the ftrace module notifier is set to zero, which means that it will go behind any module notifier before it.

If another module notifier uses kprobes to set a trace at one of the locations that ftrace will use, kprobes will not know that location belongs to ftrace, and will insert a breakpoint. When the ftrace notifier is called, it will try to modify that location, and as it wont see what it expects to see. It will spit out a warning and disable itself. This will also disable kprobes from using any other location belonging to ftrace (function start).

The fix is simply to make the ftrace notifier the max priority such that it runs ahead of any other notifier that might use kprobes.

Comment 10 Frank Ch. Eigler 2012-12-14 03:54:10 UTC
FWIW, the systemtap module_notifier runs at priority 1.

Comment 11 Frank Ch. Eigler 2013-02-13 03:31:35 UTC
Steven, has your patch been merged?  Even if not, could we get it backported to rawhide?

Comment 12 Frank Ch. Eigler 2013-02-13 03:34:48 UTC
Josh reports it's upstream as commit c1bf08ac2, merged in v3.8-rc5; will test -rc7 now.

Comment 13 Fedora End Of Life 2013-04-03 15:41:14 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19