Created attachment 525651 [details] dmesg kernel 3.1.0-0.rc6.git0.0.fc17.x86_64, running on 4cpu KVM Testing kprobe / module loading stability, I ran into a reproducible insta-crash using perf probe: # cd /lib/modules/`uname -r`/kernel/fs # find . -name '*.ko' | while read nm do readelf -s $nm | grep 'FUNC.* 6 ' | grep -v _module | awk '{print "'$nm' " $8}' done | while read mod fn; do perf probe -m $mod --add $fn done This creates perf-probes (kprobes) on (mostly) offline modules' init functions. Now we start recording: # perf record -e probe:\* -aR sh sh-4.2# perf probe --del probe:\* Inside the subshell, we drop all the kprobes. Now we exit: sh-4.2# exit Bang. Panic text attached.
I've ensured that this occurs also on the upstream kernel. And when I got the log on VirtualBox, it said that the BUG was a kernel pagefault. [ 635.796100] BUG: unable to handle kernel paging request at ffff88984fd000ac [ 635.797084] IP: [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d [ 635.797084] PGD 0 [ 635.797084] Oops: 0000 [#1] SMP [ 635.797084] CPU 1 [ 635.797084] Modules linked in: virtio_net pcspkr i2c_piix4 i2c_core [last unloaded: scsi_wait_scan] [ 635.797084] [ 635.797084] Pid: 0, comm: kworker/0:0 Not tainted 3.1.0-rc4-tip+ #20 innotek GmbH VirtualBox [ 635.797084] RIP: 0010:[<ffffffff810b9f76>] [<ffffffff810b9f76>] perf_trace_add+0x5a/0x7d [ 635.797084] RSP: 0018:ffff88004fd03cc8 EFLAGS: 00010086 [ 635.797084] RAX: ffff88984fd000ac RBX: ffff88004c13b000 RCX: 0000000000000000 Here is the disassembled code: ---- list = this_cpu_ptr(pcpu_list); ffffffff810b9f6d: 65 48 03 04 25 f0 c4 add %gs:0xc4f0,%rax ffffffff810b9f74: 00 00 * list-traversal primitive must be guarded by rcu_read_lock(). */ static inline void hlist_add_head_rcu(struct hlist_node *n, struct hlist_head *h) { struct hlist_node *first = h->first; ffffffff810b9f76: 48 8b 10 mov (%rax),%rdx ---- So, it seems that perf was trying to add removed event and failed. Hmm, I need to dig this: - Does that also occur on not .init functions? - Does that also occur on online module functions?
OK, I think this should be fixed in kprobe-tracer side not to free a tracepoint if it is in use, because the suggested reproducing method is not a normal operation and should be rejected with -EBUSY. Thank you for reporting :)
I've sent a bugfix patch (and 3 trivial typo fixes) here. http://thread.gmane.org/gmane.linux.kernel/1199036 It prevents removing running kprobe-events to fix this bug. Thank you,
The fixes went upstream as 02ca1521ad404cf566e0075848f80d064c0a0503 and 44a56040a0037a845d5fa218dffde464579f0cab Both were CC'd to stable, so they should make their way back into f15/f16 shortly.