Bug 742383
| Summary: | panic on perf probe manipulation during perf record | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Frank Ch. Eigler <fche> | ||||
| Component: | kernel | Assignee: | Kernel Maintainer List <kernel-maint> | ||||
| Status: | CLOSED RAWHIDE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | rawhide | CC: | gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, masami.hiramatsu.pt | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2011-11-08 00:07:09 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
I've ensured that this occurs also on the upstream kernel. And when I got the log on VirtualBox, it said that the BUG was a kernel pagefault.
[ 635.796100] BUG: unable to handle kernel paging request at ffff88984fd000ac
[ 635.797084] IP: [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d
[ 635.797084] PGD 0
[ 635.797084] Oops: 0000 [#1] SMP
[ 635.797084] CPU 1
[ 635.797084] Modules linked in: virtio_net pcspkr i2c_piix4 i2c_core [last unloaded: scsi_wait_scan]
[ 635.797084]
[ 635.797084] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc4-tip+ #20 innotek GmbH VirtualBox
[ 635.797084] RIP: 0010:[<ffffffff810b9f76>] [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d
[ 635.797084] RSP: 0018:ffff88004fd03cc8 EFLAGS: 00010086
[ 635.797084] RAX: ffff88984fd000ac RBX: ffff88004c13b000 RCX: 0000000000000000
Here is the disassembled code:
----
list = this_cpu_ptr(pcpu_list);
ffffffff810b9f6d: 65 48 03 04 25 f0 c4 add %gs:0xc4f0,%rax
ffffffff810b9f74: 00 00
* list-traversal primitive must be guarded by rcu_read_lock().
*/
static inline void hlist_add_head_rcu(struct hlist_node *n,
struct hlist_head *h)
{
struct hlist_node *first = h->first;
ffffffff810b9f76: 48 8b 10 mov (%rax),%rdx
----
So, it seems that perf was trying to add removed event and failed.
Hmm, I need to dig this:
- Does that also occur on not .init functions?
- Does that also occur on online module functions?
OK, I think this should be fixed in kprobe-tracer side not to free a tracepoint if it is in use, because the suggested reproducing method is not a normal operation and should be rejected with -EBUSY. Thank you for reporting :) I've sent a bugfix patch (and 3 trivial typo fixes) here. http://thread.gmane.org/gmane.linux.kernel/1199036 It prevents removing running kprobe-events to fix this bug. Thank you, The fixes went upstream as 02ca1521ad404cf566e0075848f80d064c0a0503 and 44a56040a0037a845d5fa218dffde464579f0cab Both were CC'd to stable, so they should make their way back into f15/f16 shortly. |
Created attachment 525651 [details] dmesg kernel 3.1.0-0.rc6.git0.0.fc17.x86_64, running on 4cpu KVM Testing kprobe / module loading stability, I ran into a reproducible insta-crash using perf probe: # cd /lib/modules/`uname -r`/kernel/fs # find . -name '*.ko' | while read nm do readelf -s $nm | grep 'FUNC.* 6 ' | grep -v _module | awk '{print "'$nm' " $8}' done | while read mod fn; do perf probe -m $mod --add $fn done This creates perf-probes (kprobes) on (mostly) offline modules' init functions. Now we start recording: # perf record -e probe:\* -aR sh sh-4.2# perf probe --del probe:\* Inside the subshell, we drop all the kprobes. Now we exit: sh-4.2# exit Bang. Panic text attached.