Bug 742383
Summary: | panic on perf probe manipulation during perf record | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Frank Ch. Eigler <fche> | ||||
Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | rawhide | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, masami.hiramatsu.pt | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2011-11-08 00:07:09 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
I've ensured that this occurs also on the upstream kernel. And when I got the log on VirtualBox, it said that the BUG was a kernel pagefault. [ 635.796100] BUG: unable to handle kernel paging request at ffff88984fd000ac [ 635.797084] IP: [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d [ 635.797084] PGD 0 [ 635.797084] Oops: 0000 [#1] SMP [ 635.797084] CPU 1 [ 635.797084] Modules linked in: virtio_net pcspkr i2c_piix4 i2c_core [last unloaded: scsi_wait_scan] [ 635.797084] [ 635.797084] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc4-tip+ #20 innotek GmbH VirtualBox [ 635.797084] RIP: 0010:[<ffffffff810b9f76>] [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d [ 635.797084] RSP: 0018:ffff88004fd03cc8 EFLAGS: 00010086 [ 635.797084] RAX: ffff88984fd000ac RBX: ffff88004c13b000 RCX: 0000000000000000 Here is the disassembled code: ---- list = this_cpu_ptr(pcpu_list); ffffffff810b9f6d: 65 48 03 04 25 f0 c4 add %gs:0xc4f0,%rax ffffffff810b9f74: 00 00 * list-traversal primitive must be guarded by rcu_read_lock(). */ static inline void hlist_add_head_rcu(struct hlist_node *n, struct hlist_head *h) { struct hlist_node *first = h->first; ffffffff810b9f76: 48 8b 10 mov (%rax),%rdx ---- So, it seems that perf was trying to add removed event and failed. Hmm, I need to dig this: - Does that also occur on not .init functions? - Does that also occur on online module functions? OK, I think this should be fixed in kprobe-tracer side not to free a tracepoint if it is in use, because the suggested reproducing method is not a normal operation and should be rejected with -EBUSY. Thank you for reporting :) I've sent a bugfix patch (and 3 trivial typo fixes) here. http://thread.gmane.org/gmane.linux.kernel/1199036 It prevents removing running kprobe-events to fix this bug. Thank you, The fixes went upstream as 02ca1521ad404cf566e0075848f80d064c0a0503 and 44a56040a0037a845d5fa218dffde464579f0cab Both were CC'd to stable, so they should make their way back into f15/f16 shortly. |
Created attachment 525651 [details] dmesg kernel 3.1.0-0.rc6.git0.0.fc17.x86_64, running on 4cpu KVM Testing kprobe / module loading stability, I ran into a reproducible insta-crash using perf probe: # cd /lib/modules/`uname -r`/kernel/fs # find . -name '*.ko' | while read nm do readelf -s $nm | grep 'FUNC.* 6 ' | grep -v _module | awk '{print "'$nm' " $8}' done | while read mod fn; do perf probe -m $mod --add $fn done This creates perf-probes (kprobes) on (mostly) offline modules' init functions. Now we start recording: # perf record -e probe:\* -aR sh sh-4.2# perf probe --del probe:\* Inside the subshell, we drop all the kprobes. Now we exit: sh-4.2# exit Bang. Panic text attached.