Bug 865184
Summary: | ftrace_bug() while running systemtap testuite | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Frank Ch. Eigler <fche> | ||||||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 19 | CC: | gansalmon, itamar, jforbes, jistone, jonathan, kernel-maint, madhu.chinakonda, masami.hiramatsu.pt, srostedt | ||||||||
Target Milestone: | --- | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | Unspecified | ||||||||||
OS: | Unspecified | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2013-04-05 19:17:39 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Description
Frank Ch. Eigler
2012-10-10 23:51:10 UTC
(In reply to comment #0) > [ 242.823809] ftrace failed to modify [<ffffffffa016f000>] > init_once+0x0/0x20 [fat] > [ 242.831132] actual: cc:3b:75:58:e1 That "cc" indicates an existing int3 breakpoint. It could be that stap managed to get in there with a kprobe before ftrace could, so ftrace is whining. ;) It should be prepared for this possibility though, for example perf probe could get in there too. ftrace is very critical about code changes, and will disable itself if it does not expect something. Remember, this was part of the reason ftrace bricked e1000e cards, so I do not take this lightly. Now Masami Hiramatsu and I worked together to get kprobes and ftrace working in a way that kprobes now uses ftrace's infrastructure when the probe is added to a ftrace nop. In other words, kprobes should never be adding a breakpoint to a ftrace nop (and neither does perf). That said, the way ftrace updates its nops now is via a breakpoint. So that 'cc' you see could possibly be a bug in the ftrace accounting, and for some reason a breakpoint wasn't removed or was added incorrectly. Created attachment 625691 [details]
screenshot of later stage of warnings, now turned into hangs
It looks as though the kprobe-ftrace interfaces are acting up. Steve, do you need help getting the stap testsuite running to let you reproduce this?
Created attachment 626330 [details]
kvm serial console log of another related failure
In rawhide kernel 3.7.0-0.rc8.git0.1.fc19.x86_64, I'm still seeing kprobes problems: Dec 12 18:12:15 localhost kernel: WARNING: at kernel/trace/ftrace.c:1618 ftrace_bug+0x239/0x280() Dec 12 18:12:15 localhost kernel: Hardware name: Bochs Dec 12 18:12:15 localhost kernel: Modules linked in: fat(+) stap_56d28a51b3fe546293ca0700b10bcb29__8059(F) nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: bid_shared] Dec 12 18:12:15 localhost kernel: Pid: 8068, comm: modprobe Tainted: GF 3.7.0-0.rc8.git0.1.fc19.x86_64 #1 Dec 12 18:12:15 localhost kernel: Call Trace: Dec 12 18:12:15 localhost kernel: [<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0 Dec 12 18:12:15 localhost kernel: [<ffffffff81134106>] ? __probe_kernel_read+0x46/0x70 Dec 12 18:12:15 localhost kernel: [<ffffffffa0180000>] ? 0xffffffffa017ffff Dec 12 18:12:15 localhost kernel: [<ffffffffa0180000>] ? 0xffffffffa017ffff Dec 12 18:12:15 localhost kernel: [<ffffffff8105e76a>] warn_slowpath_null+0x1a/0x20 Dec 12 18:12:15 localhost kernel: [<ffffffff810fd189>] ftrace_bug+0x239/0x280 Dec 12 18:12:15 localhost kernel: [<ffffffff810fd626>] ftrace_process_locs+0x376/0x520 Dec 12 18:12:15 localhost kernel: [<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50 Dec 12 18:12:15 localhost kernel: [<ffffffff8163912d>] notifier_call_chain+0x4d/0x70 Dec 12 18:12:15 localhost kernel: [<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80 Dec 12 18:12:15 localhost kernel: [<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20 Dec 12 18:12:15 localhost kernel: [<ffffffff810c2a23>] sys_init_module+0x73/0x220 Dec 12 18:12:15 localhost kernel: [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b Dec 12 18:12:15 localhost kernel: ---[ end trace 9ef46351e53bbf80 ]--- Dec 12 18:12:15 localhost kernel: ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat] Dec 12 18:12:15 localhost kernel: actual: cc:bb:d2:4b:e1 Dec 12 18:12:30 localhost kernel: hrtimer: interrupt took 2706365 ns and later many of these: Dec 12 22:47:04 localhost kernel: WARNING: at kernel/kprobes.c:965 disarm_kprobe+0xb7/0x100() Dec 12 22:47:04 localhost kernel: Hardware name: Bochs Dec 12 22:47:04 localhost kernel: Failed to disarm kprobe-ftrace at ffffffff81196100 (-19) Dec 12 22:47:04 localhost kernel: Modules linked in: stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6674(F) binfmt_misc nfsv4 auth_rpcgss nfs dns_resolver fscache xt_nat iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack lockd sunrpc ppdev parport_pc parport microcode virtio_net i2c_piix4 drm_kms_helper ttm drm i2c_core [last unloaded: stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6653] Dec 12 22:47:04 localhost kernel: Pid: 6674, comm: stapio Tainted: GF W O 3.7.0-0.rc8.git0.1.fc19.x86_64 #1 Dec 12 22:47:04 localhost kernel: Call Trace: Dec 12 22:47:04 localhost kernel: [<ffffffff8105e70f>] warn_slowpath_common+0x7f/0xc0 Dec 12 22:47:04 localhost kernel: [<ffffffff8105e806>] warn_slowpath_fmt+0x46/0x50 Dec 12 22:47:04 localhost kernel: [<ffffffff81196100>] ? sys_readv+0xc0/0xc0 Dec 12 22:47:04 localhost kernel: [<ffffffff8163a857>] disarm_kprobe+0xb7/0x100 Dec 12 22:47:04 localhost kernel: [<ffffffff8163a9f4>] __disable_kprobe+0x64/0x80 Dec 12 22:47:04 localhost kernel: [<ffffffff8163aa78>] __unregister_kprobe_top+0x18/0x1d0 Dec 12 22:47:04 localhost kernel: [<ffffffff816328fd>] ? mutex_lock+0x1d/0x50 Dec 12 22:47:04 localhost kernel: [<ffffffff8163ac7d>] unregister_kretprobes.part.22+0x4d/0x1b0 Dec 12 22:47:04 localhost kernel: [<ffffffff8163adf2>] unregister_kretprobes+0x12/0x20 Dec 12 22:47:04 localhost kernel: [<ffffffffa0341db3>] _stp_cleanup_and_exit+0x1f3/0x460 [stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6674] Dec 12 22:47:04 localhost kernel: [<ffffffffa03423d2>] _stp_ctl_write_cmd+0x262/0xa50 [stap_4cfbbd6888a99ed7b1d9909bb425002b_4_6674] Dec 12 22:47:04 localhost kernel: [<ffffffff8128888c>] ? security_file_permission+0x2c/0xb0 Dec 12 22:47:04 localhost kernel: [<ffffffff811955fc>] vfs_write+0xac/0x180 Dec 12 22:47:04 localhost kernel: [<ffffffff81195942>] sys_write+0x52/0xa0 Dec 12 22:47:04 localhost kernel: [<ffffffff810628b0>] ? task_stopped_code+0x60/0x60 Dec 12 22:47:04 localhost kernel: [<ffffffff8163d719>] system_call_fastpath+0x16/0x1b Dec 12 18:12:15 localhost kernel: [<ffffffff810fefb7>] ftrace_module_notify+0x47/0x50 Dec 12 18:12:15 localhost kernel: [<ffffffff8163912d>] notifier_call_chain+0x4d/0x70 Dec 12 18:12:15 localhost kernel: [<ffffffff810882f8>] __blocking_notifier_call_chain+0x58/0x80 Dec 12 18:12:15 localhost kernel: [<ffffffff81088336>] blocking_notifier_call_chain+0x16/0x20 Dec 12 18:12:15 localhost kernel: [<ffffffff810c2a23>] sys_init_module+0x73/0x220 Seems to tell us there's a problem with ftrace module accounting. I got a similar bug report on LKML last week. I'll take a look at this tomorrow. Ah, this is a different bug than the one reported on LKML. The LKML one is an accounting error, this one is a modification error. Here's the important part: Dec 12 18:12:15 localhost kernel: ---[ end trace 9ef46351e53bbf80 ]--- Dec 12 18:12:15 localhost kernel: ftrace failed to modify [<ffffffffa0180000>] init_once+0x0/0x20 [fat] Dec 12 18:12:15 localhost kernel: actual: cc:bb:d2:4b:e1 For some reason, it failed to modify the init_once() function of the fat module. The code that was there had a breakpoint?? cc bb d2 4b e1 Looking closer at the call trace, this was loading the module. On module load, we convert the breakpoints into nops. As this is done before the code actually runs, it doesn't use breakpoints to modify it, it just does a direct modification. Now how did this get a breakpoint on that address before it was even registered with ftrace? Does systemtap add a kprobe on module load via a notifier? If so, we need to serialize this such that the ftrace notifier runs before other notifiers. Steven, thanks for looking into it. Yes, a few test cases put kprobes on module-init functions, thus have to be emplaced during the module-load notifier. (perf probe can do the same thing). Created attachment 663310 [details]
Set ftrace module notifier to max
Currently the ftrace module notifier is set to zero, which means that it will go behind any module notifier before it.
If another module notifier uses kprobes to set a trace at one of the locations that ftrace will use, kprobes will not know that location belongs to ftrace, and will insert a breakpoint. When the ftrace notifier is called, it will try to modify that location, and as it wont see what it expects to see. It will spit out a warning and disable itself. This will also disable kprobes from using any other location belonging to ftrace (function start).
The fix is simply to make the ftrace notifier the max priority such that it runs ahead of any other notifier that might use kprobes.
FWIW, the systemtap module_notifier runs at priority 1. Steven, has your patch been merged? Even if not, could we get it backported to rawhide? Josh reports it's upstream as commit c1bf08ac2, merged in v3.8-rc5; will test -rc7 now. This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle. Changing version to '19'. (As we did not run this process for some time, it could affect also pre-Fedora 19 development cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.) More information and reason for this action is here: https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19 |