| Summary: | utrace_reset triggers WARNING: CPU: 1 PID: 21139 at kernel/sched/core.c:1582 | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Fedora] Fedora | Reporter: | Frank Ch. Eigler <fche> | ||||
| Component: | systemtap | Assignee: | Frank Ch. Eigler <fche> | ||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Fedora Extras Quality Assurance <extras-qa> | ||||
| Severity: | unspecified | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | rawhide | CC: | dsmith, fche, jistone, lberk, mjw, nathans, scox, wcohen | ||||
| Target Milestone: | --- | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2014-04-07 00:56:00 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
|
Description
Frank Ch. Eigler
2013-10-13 14:32:26 UTC
I'm not seeing any kernel warnings when running full installcheck against kernel 3.12.0-0.rc5.git0.1.fc21.i686+PAE. Can you upgrade and try again? [ 3105.994435] stap_4da7d47dfc8cfe53e7a2c8685a9358a_32190: systemtap: 2.4/0.157, base: f80d0000, memory: 17data/24text/1ctx/2058net/593alloc kb, probes: 5 [ 3118.837901] stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498: systemtap: 2.4/0.157, base: f821d000, memory: 29data/40text/0ctx/2058net/17alloc kb, probes: 4 [ 3119.166241] BUG: sleeping function called from invalid context at kernel/mutex.c:616 [ 3119.247899] in_atomic(): 1, irqs_disabled(): 0, pid: 32501, name: towers.x [ 3119.250971] no locks held by towers.x/32501. [ 3119.252635] CPU: 1 PID: 32501 Comm: towers.x Tainted: GF W O 3.12.0-0.rc7.git1.1.fc21.i686+PAE #1 [ 3119.256163] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011 [ 3119.258119] 00000000 00000000 c9397e14 c0a74cb9 00007ef5 c9397e3c c0491a35 c0c2c694 [ 3119.261447] 00000001 00000000 00007ef5 f3640228 d50f6a30 ea9ab620 00000001 c9397e88 [ 3119.264771] c0a77c53 f3640000 00000000 c9397e74 00000096 f36403e0 c04caf6b c9397e70 [ 3119.268122] Call Trace: [ 3119.269241] [<c0a74cb9>] dump_stack+0x4b/0x66 [ 3119.271198] [<c0491a35>] __might_sleep+0x165/0x210 [ 3119.273255] [<c0a77c53>] mutex_lock_nested+0x23/0x3b0 [ 3119.275138] [<c04caf6b>] ? trace_hardirqs_off+0xb/0x10 [ 3119.353239] [<c049a955>] ? local_clock+0x55/0x60 [ 3119.355553] [<c04cb9d8>] ? lock_release_holdtime.part.29+0xb8/0x100 [ 3119.358415] [<c053311a>] perf_event_release_kernel+0x2a/0x80 [ 3119.361041] [<f8222190>] ? _stp_perf_init+0x160/0x160 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.364569] [<f821f105>] _stp_perf_del+0x75/0xa0 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.367780] [<f8222190>] ? _stp_perf_init+0x160/0x160 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.371009] [<f82221a4>] _stp_perf_probe_cb+0x14/0x30 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.374247] [<f82223da>] __stp_utrace_task_finder_target_death+0x8a/0x100 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.377970] [<c0a7cb42>] ? _raw_spin_unlock+0x22/0x30 [ 3119.379797] [<f8220a00>] utrace_report_death+0xb0/0x290 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.459611] [<c05a0d72>] ? fput+0x62/0xa0 [ 3119.461558] [<c0462941>] do_exit+0x721/0xb90 [ 3119.463183] [<f821e3c8>] ? task_utrace_struct+0x28/0x30 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498] [ 3119.466574] [<c04f6304>] ? __audit_syscall_entry+0x94/0xe0 [ 3119.468485] [<c0462e24>] do_group_exit+0x34/0xa0 [ 3119.470169] [<c0462ea6>] SyS_exit_group+0x16/0x20 [ 3119.471991] [<c0a7d807>] syscall_call+0x7/0xb After upping my vm to 2GB ram, 2 vcpus, I'm seeing the above trace also when running the systemtap.base/perf.exp testcase. OK, now I'm more confused. Here's a script:
====
global seen
function non_atomic_context:long()
%{
if (in_atomic() || irqs_disabled())
STAP_RETVALUE = 0;
else
STAP_RETVALUE = 1;
%}
probe process.begin, process.end, process.syscall, process.syscall.return
{
seen[pp()] = non_atomic_context()
}
probe end
{
foreach (pp+ in seen) {
printf("%s - %d\n", pp, seen[pp])
}
}
====
On 3.12.0-0.rc7.git0.1.fc21.x86_64, I see the following, which is the correct output - all the probe handlers are run in non-atomic context.
====
# stap -g ../context.stp -c 'cat /dev/null'
process.begin - 1
process.end - 1
process.syscall - 1
process.syscall.return - 1
====
When I run the same script on 3.12.0-0.rc7.git1.1.fc21.i686+PAE, I see the following - which means all the probe handers are being run in atomic context:
====
stap -g ../context.stp -c 'cat /dev/null'
process.begin - 0
process.end - 0
process.syscall - 0
process.syscall.return - 0
====
Something odd is going on.
I've run that same script on the following kernels: 3.7.9-104.fc17.i686.PAE 3.11.4-101.fc18.i686.PAE 3.11.6-200.fc19.i686.PAE and I get the correct answers. I'm wondering if 3.12.0-0.rc7.git1.1.fc21.i686+PAE is configured oddly. Created attachment 820375 [details]
potential patch
Here's a patch I'm testing. It adds a 'task_work' handler to task death events if the tracepoint handler is in atomic context.
The good news is that I no longer get the "sleeping" bug when I run the perf.exp test. The bad news is that I'm seeing some instability in the utrace_p5.exp test. I'm going to do further testing to see if that instability was present without this patch.
I also don't know yet if we're getting that warning from other places since I haven't run the full testsuite yet.
commit bca2a6fc / PR16129 |