Bug 742383

Summary:

panic on perf probe manipulation during perf record

Product:

[Fedora] Fedora

Reporter:

Frank Ch. Eigler <fche>

Component:

kernel

Assignee:

Kernel Maintainer List <kernel-maint>

Status:

CLOSED RAWHIDE

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

unspecified

Docs Contact:

Priority:

unspecified

Version:

rawhide

CC:

gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda, masami.hiramatsu.pt

Target Milestone:

---

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2011-11-08 00:07:09 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
dmesg	none

Description Frank Ch. Eigler 2011-09-29 21:49:29 UTC

Created attachment 525651 [details]
dmesg

kernel 3.1.0-0.rc6.git0.0.fc17.x86_64, running on 4cpu KVM

Testing kprobe / module loading stability, I ran into a reproducible
insta-crash using perf probe:

# cd /lib/modules/`uname -r`/kernel/fs
# find . -name '*.ko' | while read nm
do readelf -s $nm | grep 'FUNC.* 6 ' | grep -v _module  | awk '{print "'$nm' " $8}'
done | while read mod fn; do
perf probe -m $mod --add $fn 
done      

This creates perf-probes (kprobes) on (mostly) offline modules' init functions.
Now we start recording:

# perf record -e probe:\* -aR sh
sh-4.2# perf probe --del probe:\*

Inside the subshell, we drop all the kprobes.  Now we exit:

sh-4.2# exit

Bang.  Panic text attached.

Comment 1 Masami Hiramatsu 2011-09-30 07:59:14 UTC

I've ensured that this occurs also on the upstream kernel. And when I got the log on VirtualBox, it said that the BUG was a kernel pagefault.

[  635.796100] BUG: unable to handle kernel paging request at ffff88984fd000ac
[  635.797084] IP: [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d
[  635.797084] PGD 0 
[  635.797084] Oops: 0000 [#1] SMP 
[  635.797084] CPU 1 
[  635.797084] Modules linked in: virtio_net pcspkr i2c_piix4 i2c_core [last unloaded: scsi_wait_scan]
[  635.797084] 
[  635.797084] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc4-tip+ #20 innotek GmbH VirtualBox
[  635.797084] RIP: 0010:[<ffffffff810b9f76>]  [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d
[  635.797084] RSP: 0018:ffff88004fd03cc8  EFLAGS: 00010086
[  635.797084] RAX: ffff88984fd000ac RBX: ffff88004c13b000 RCX: 0000000000000000

Here is the disassembled code:
----
        list = this_cpu_ptr(pcpu_list);
ffffffff810b9f6d:       65 48 03 04 25 f0 c4    add    %gs:0xc4f0,%rax
ffffffff810b9f74:       00 00
 * list-traversal primitive must be guarded by rcu_read_lock().
 */
static inline void hlist_add_head_rcu(struct hlist_node *n,
                                        struct hlist_head *h)
{
        struct hlist_node *first = h->first;
ffffffff810b9f76:       48 8b 10                mov    (%rax),%rdx
----

So, it seems that perf was trying to add removed event and failed.
Hmm, I need to dig this:
- Does that also occur on not .init functions?
- Does that also occur on online module functions?

Comment 2 Masami Hiramatsu 2011-10-03 02:30:21 UTC

OK, I think this should be fixed in kprobe-tracer side not to free a tracepoint if it is in use, because the suggested reproducing method is not a normal operation and should be rejected with -EBUSY.

Thank you for reporting :)

Comment 3 Masami Hiramatsu 2011-10-04 10:50:40 UTC

I've sent a bugfix patch (and 3 trivial typo fixes) here.
http://thread.gmane.org/gmane.linux.kernel/1199036

It prevents removing running kprobe-events to fix this bug.

Thank you,

Comment 4 Josh Boyer 2011-11-08 00:07:09 UTC

The fixes went upstream as
02ca1521ad404cf566e0075848f80d064c0a0503

and

44a56040a0037a845d5fa218dffde464579f0cab

Both were CC'd to stable, so they should make their way back into f15/f16 shortly.