git systemtap, running full installcheck against kernel 3.12.0-0.rc4.git3.1.fc21.i686+PAE. For several of the test cases, we see the following kernel warning: [10654.780511] stap_33886239fd38f992928b591d8662874_21139: systemtap: 2.4/0.157, base: f8173000, memory: 37data/40text/0ctx/2058net/17alloc kb, probes: 1, unpriv-uid: 0 [10654.802256] ------------[ cut here ]------------ [10654.805976] WARNING: CPU: 1 PID: 21139 at kernel/sched/core.c:1582 wake_up_process+0x31/0x40() [10654.805976] Modules linked in: stap_33886239fd38f992928b591d8662874_21139(OF) binfmt_misc rpcsec_gss_krb5 auth_rpcgss xt_nat nfsv4 iptable_nat nf_conntrack_ipv4 dns_resolver nf_defrag_ipv4 nfs lockd nf_nat_ipv4 sunrpc nf_nat nf_conntrack cachefiles fscache ppdev microcode serio_raw parport_pc i2c_piix4 parport i2c_core ata_generic e1000 pata_acpi [last unloaded: stap_af74183a2b21519858cea49a8479ac2_20857] [10654.805976] CPU: 1 PID: 21139 Comm: stapio Tainted: GF W O 3.12.0-0.rc4.git3.1.fc21.i686+PAE #1 [10654.805976] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [10654.805976] 00000000 00000000 df387e44 c0a770c5 00000000 df387e74 c045f90e c0c2dde0 [10654.805976] 00000001 00005293 c0c3bcc3 0000062e c0497cd1 c0497cd1 cc43ec00 c3c92130 [10654.805976] 0000001f df387e84 c045f9d2 00000009 00000000 df387e90 c0497cd1 cc43ec00 [10654.805976] Call Trace: [10654.805976] [<c0a770c5>] dump_stack+0x4b/0x66 [10654.805976] [<c045f90e>] warn_slowpath_common+0x7e/0xa0 [10654.805976] [<c0497cd1>] ? wake_up_process+0x31/0x40 [10654.805976] [<c0497cd1>] ? wake_up_process+0x31/0x40 [10654.805976] [<c045f9d2>] warn_slowpath_null+0x22/0x30 [10654.805976] [<c0497cd1>] wake_up_process+0x31/0x40 [10654.805976] [<f81763a5>] utrace_reset+0x2b5/0x2c0 [stap_33886239fd38f992928b591d8662874_21139] [10654.805976] [<f8176938>] utrace_control+0xa8/0x390 [stap_33886239fd38f992928b591d8662874_21139] [10654.805976] [<c0a7ef52>] ? _raw_spin_unlock+0x22/0x30 [10654.805976] [<f81756dc>] ? utrace_attach_task+0x11c/0x280 [stap_33886239fd38f992928b591d8662874_21139] [10654.805976] [<f817c094>] _stp_ctl_write_cmd+0x1164/0x1480 [stap_33886239fd38f992928b591d8662874_21139] [10654.805976] [<f817bf3e>] ? _stp_ctl_write_cmd+0x100e/0x1480 [stap_33886239fd38f992928b591d8662874_21139] [10654.805976] [<f817af30>] ? __stp_tf_clone_worker+0xb0/0xb0 [stap_33886239fd38f992928b591d8662874_21139] [10654.805976] [<c05a0d5d>] vfs_write+0x9d/0x1b0 [10654.805976] [<c05bcb1f>] ? fget_light+0x2f/0x480 [10654.805976] [<c05a1409>] SyS_write+0x49/0x90 [10654.805976] [<c0a7fc17>] syscall_call+0x7/0xb [10654.805976] ---[ end trace 25903092d09c8ec4 ]---
I'm not seeing any kernel warnings when running full installcheck against kernel 3.12.0-0.rc5.git0.1.fc21.i686+PAE. Can you upgrade and try again?
[ 3105.994435] stap_4da7d47dfc8cfe53e7a2c8685a9358a_32190: systemtap: 2.4/0.157, base: f80d0000, memory: 17data/24text/1ctx/2058net/593alloc kb, probes: 5 [ 3118.837901] stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498: systemtap: 2.4/0.157, base: f821d000, memory: 29data/40text/0ctx/2058net/17alloc kb, probes: 4 [ 3119.166241] BUG: sleeping function called from invalid context at kernel/mutex.c:616 [ 3119.247899] in_atomic(): 1, irqs_disabled(): 0, pid: 32501, name: towers.x [ 3119.250971] no locks held by towers.x/32501. [ 3119.252635] CPU: 1 PID: 32501 Comm: towers.x Tainted: GF W O 3.12.0-0.rc7.git1.1.fc21.i686+PAE #1 [ 3119.256163] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 3119.258119] 00000000 00000000 c9397e14 c0a74cb9 00007ef5 c9397e3c c0491a35 c0c2c694 [ 3119.261447] 00000001 00000000 00007ef5 f3640228 d50f6a30 ea9ab620 00000001 c9397e88 [ 3119.264771] c0a77c53 f3640000 00000000 c9397e74 00000096 f36403e0 c04caf6b c9397e70 [ 3119.268122] Call Trace: [ 3119.269241] [<c0a74cb9>] dump_stack+0x4b/0x66 [ 3119.271198] [<c0491a35>] __might_sleep+0x165/0x210 [ 3119.273255] [<c0a77c53>] mutex_lock_nested+0x23/0x3b0 [ 3119.275138] [<c04caf6b>] ? trace_hardirqs_off+0xb/0x10 [ 3119.353239] [<c049a955>] ? local_clock+0x55/0x60 [ 3119.355553] [<c04cb9d8>] ? lock_release_holdtime.part.29+0xb8/0x100 [ 3119.358415] [<c053311a>] perf_event_release_kernel+0x2a/0x80 [ 3119.361041] [<f8222190>] ? _stp_perf_init+0x160/0x160 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.364569] [<f821f105>] _stp_perf_del+0x75/0xa0 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.367780] [<f8222190>] ? _stp_perf_init+0x160/0x160 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.371009] [<f82221a4>] _stp_perf_probe_cb+0x14/0x30 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.374247] [<f82223da>] __stp_utrace_task_finder_target_death+0x8a/0x100 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.377970] [<c0a7cb42>] ? _raw_spin_unlock+0x22/0x30 [ 3119.379797] [<f8220a00>] utrace_report_death+0xb0/0x290 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.459611] [<c05a0d72>] ? fput+0x62/0xa0 [ 3119.461558] [<c0462941>] do_exit+0x721/0xb90 [ 3119.463183] [<f821e3c8>] ? task_utrace_struct+0x28/0x30 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.466574] [<c04f6304>] ? __audit_syscall_entry+0x94/0xe0 [ 3119.468485] [<c0462e24>] do_group_exit+0x34/0xa0 [ 3119.470169] [<c0462ea6>] SyS_exit_group+0x16/0x20 [ 3119.471991] [<c0a7d807>] syscall_call+0x7/0xb
After upping my vm to 2GB ram, 2 vcpus, I'm seeing the above trace also when running the systemtap.base/perf.exp testcase.
OK, now I'm more confused. Here's a script: ==== global seen function non_atomic_context:long() %{ if (in_atomic() || irqs_disabled()) STAP_RETVALUE = 0; else STAP_RETVALUE = 1; %} probe process.begin, process.end, process.syscall, process.syscall.return { seen[pp()] = non_atomic_context() } probe end { foreach (pp+ in seen) { printf("%s - %d\n", pp, seen[pp]) } } ==== On 3.12.0-0.rc7.git0.1.fc21.x86_64, I see the following, which is the correct output - all the probe handlers are run in non-atomic context. ==== # stap -g ../context.stp -c 'cat /dev/null' process.begin - 1 process.end - 1 process.syscall - 1 process.syscall.return - 1 ==== When I run the same script on 3.12.0-0.rc7.git1.1.fc21.i686+PAE, I see the following - which means all the probe handers are being run in atomic context: ==== stap -g ../context.stp -c 'cat /dev/null' process.begin - 0 process.end - 0 process.syscall - 0 process.syscall.return - 0 ==== Something odd is going on.
I've run that same script on the following kernels: 3.7.9-104.fc17.i686.PAE 3.11.4-101.fc18.i686.PAE 3.11.6-200.fc19.i686.PAE and I get the correct answers. I'm wondering if 3.12.0-0.rc7.git1.1.fc21.i686+PAE is configured oddly.
Created attachment 820375 [details] potential patch Here's a patch I'm testing. It adds a 'task_work' handler to task death events if the tracepoint handler is in atomic context. The good news is that I no longer get the "sleeping" bug when I run the perf.exp test. The bad news is that I'm seeing some instability in the utrace_p5.exp test. I'm going to do further testing to see if that instability was present without this patch. I also don't know yet if we're getting that warning from other places since I haven't run the full testsuite yet.
commit bca2a6fc / PR16129