Bug 1018577 - utrace_reset triggers WARNING: CPU: 1 PID: 21139 at kernel/sched/core.c:1582
Summary: utrace_reset triggers WARNING: CPU: 1 PID: 21139 at kernel/sched/core.c:1582
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: systemtap
Version: rawhide
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Frank Ch. Eigler
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-10-13 14:32 UTC by Frank Ch. Eigler
Modified: 2014-04-07 00:56 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2014-04-07 00:56:00 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)
potential patch (5.16 KB, patch)
2013-11-06 13:46 UTC, David Smith
no flags Details | Diff

Description Frank Ch. Eigler 2013-10-13 14:32:26 UTC
git systemtap, running full installcheck against kernel
3.12.0-0.rc4.git3.1.fc21.i686+PAE.  For several of the test
cases, we see the following kernel warning:

[10654.780511] stap_33886239fd38f992928b591d8662874_21139: systemtap: 2.4/0.157, base: f8173000, memory: 37data/40text/0ctx/2058net/17alloc kb, probes: 1, unpriv-uid: 0
[10654.802256] ------------[ cut here ]------------
[10654.805976] WARNING: CPU: 1 PID: 21139 at kernel/sched/core.c:1582 wake_up_process+0x31/0x40()
[10654.805976] Modules linked in: stap_33886239fd38f992928b591d8662874_21139(OF) binfmt_misc rpcsec_gss_krb5 auth_rpcgss xt_nat nfsv4 iptable_nat nf_conntrack_ipv4 dns_resolver nf_defrag_ipv4 nfs lockd nf_nat_ipv4 sunrpc nf_nat nf_conntrack cachefiles fscache ppdev microcode serio_raw parport_pc i2c_piix4 parport i2c_core ata_generic e1000 pata_acpi [last unloaded: stap_af74183a2b21519858cea49a8479ac2_20857]
[10654.805976] CPU: 1 PID: 21139 Comm: stapio Tainted: GF       W  O 3.12.0-0.rc4.git3.1.fc21.i686+PAE #1
[10654.805976] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[10654.805976]  00000000 00000000 df387e44 c0a770c5 00000000 df387e74 c045f90e c0c2dde0
[10654.805976]  00000001 00005293 c0c3bcc3 0000062e c0497cd1 c0497cd1 cc43ec00 c3c92130
[10654.805976]  0000001f df387e84 c045f9d2 00000009 00000000 df387e90 c0497cd1 cc43ec00
[10654.805976] Call Trace:
[10654.805976]  [<c0a770c5>] dump_stack+0x4b/0x66
[10654.805976]  [<c045f90e>] warn_slowpath_common+0x7e/0xa0
[10654.805976]  [<c0497cd1>] ? wake_up_process+0x31/0x40
[10654.805976]  [<c0497cd1>] ? wake_up_process+0x31/0x40
[10654.805976]  [<c045f9d2>] warn_slowpath_null+0x22/0x30
[10654.805976]  [<c0497cd1>] wake_up_process+0x31/0x40
[10654.805976]  [<f81763a5>] utrace_reset+0x2b5/0x2c0 [stap_33886239fd38f992928b591d8662874_21139]
[10654.805976]  [<f8176938>] utrace_control+0xa8/0x390 [stap_33886239fd38f992928b591d8662874_21139]
[10654.805976]  [<c0a7ef52>] ? _raw_spin_unlock+0x22/0x30
[10654.805976]  [<f81756dc>] ? utrace_attach_task+0x11c/0x280 [stap_33886239fd38f992928b591d8662874_21139]
[10654.805976]  [<f817c094>] _stp_ctl_write_cmd+0x1164/0x1480 [stap_33886239fd38f992928b591d8662874_21139]
[10654.805976]  [<f817bf3e>] ? _stp_ctl_write_cmd+0x100e/0x1480 [stap_33886239fd38f992928b591d8662874_21139]
[10654.805976]  [<f817af30>] ? __stp_tf_clone_worker+0xb0/0xb0 [stap_33886239fd38f992928b591d8662874_21139]
[10654.805976]  [<c05a0d5d>] vfs_write+0x9d/0x1b0
[10654.805976]  [<c05bcb1f>] ? fget_light+0x2f/0x480
[10654.805976]  [<c05a1409>] SyS_write+0x49/0x90
[10654.805976]  [<c0a7fc17>] syscall_call+0x7/0xb
[10654.805976] ---[ end trace 25903092d09c8ec4 ]---

Comment 1 David Smith 2013-10-31 16:02:29 UTC
I'm not seeing any kernel warnings when running full installcheck against kernel 3.12.0-0.rc5.git0.1.fc21.i686+PAE. Can you upgrade and try again?

Comment 2 Frank Ch. Eigler 2013-10-31 17:57:29 UTC
[ 3105.994435] stap_4da7d47dfc8cfe53e7a2c8685a9358a_32190: systemtap: 2.4/0.157, base: f80d0000, memory: 17data/24text/1ctx/2058net/593alloc kb, probes: 5
[ 3118.837901] stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498: systemtap: 2.4/0.157, base: f821d000, memory: 29data/40text/0ctx/2058net/17alloc kb, probes: 4
[ 3119.166241] BUG: sleeping function called from invalid context at kernel/mutex.c:616
[ 3119.247899] in_atomic(): 1, irqs_disabled(): 0, pid: 32501, name: towers.x
[ 3119.250971] no locks held by towers.x/32501.
[ 3119.252635] CPU: 1 PID: 32501 Comm: towers.x Tainted: GF       W  O 3.12.0-0.rc7.git1.1.fc21.i686+PAE #1
[ 3119.256163] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 3119.258119]  00000000 00000000 c9397e14 c0a74cb9 00007ef5 c9397e3c c0491a35 c0c2c694
[ 3119.261447]  00000001 00000000 00007ef5 f3640228 d50f6a30 ea9ab620 00000001 c9397e88
[ 3119.264771]  c0a77c53 f3640000 00000000 c9397e74 00000096 f36403e0 c04caf6b c9397e70
[ 3119.268122] Call Trace:
[ 3119.269241]  [<c0a74cb9>] dump_stack+0x4b/0x66
[ 3119.271198]  [<c0491a35>] __might_sleep+0x165/0x210
[ 3119.273255]  [<c0a77c53>] mutex_lock_nested+0x23/0x3b0
[ 3119.275138]  [<c04caf6b>] ? trace_hardirqs_off+0xb/0x10
[ 3119.353239]  [<c049a955>] ? local_clock+0x55/0x60
[ 3119.355553]  [<c04cb9d8>] ? lock_release_holdtime.part.29+0xb8/0x100
[ 3119.358415]  [<c053311a>] perf_event_release_kernel+0x2a/0x80
[ 3119.361041]  [<f8222190>] ? _stp_perf_init+0x160/0x160 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498]
[ 3119.364569]  [<f821f105>] _stp_perf_del+0x75/0xa0 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498]
[ 3119.367780]  [<f8222190>] ? _stp_perf_init+0x160/0x160 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498]
[ 3119.371009]  [<f82221a4>] _stp_perf_probe_cb+0x14/0x30 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498]
[ 3119.374247]  [<f82223da>] __stp_utrace_task_finder_target_death+0x8a/0x100 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498]
[ 3119.377970]  [<c0a7cb42>] ? _raw_spin_unlock+0x22/0x30
[ 3119.379797]  [<f8220a00>] utrace_report_death+0xb0/0x290 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498]
[ 3119.459611]  [<c05a0d72>] ? fput+0x62/0xa0
[ 3119.461558]  [<c0462941>] do_exit+0x721/0xb90
[ 3119.463183]  [<f821e3c8>] ? task_utrace_struct+0x28/0x30 [stap_9b532e8c66bdb0009b39aa8b9f5b98f_32498]
[ 3119.466574]  [<c04f6304>] ? __audit_syscall_entry+0x94/0xe0
[ 3119.468485]  [<c0462e24>] do_group_exit+0x34/0xa0
[ 3119.470169]  [<c0462ea6>] SyS_exit_group+0x16/0x20
[ 3119.471991]  [<c0a7d807>] syscall_call+0x7/0xb

Comment 3 David Smith 2013-10-31 19:24:09 UTC
After upping my vm to 2GB ram, 2 vcpus, I'm seeing the above trace also when running the systemtap.base/perf.exp testcase.

Comment 4 David Smith 2013-11-01 15:39:09 UTC
OK, now I'm more confused. Here's a script:

====
global seen                                                                     
                                                                                
function non_atomic_context:long()                                              
%{                                                                              
        if (in_atomic() || irqs_disabled())                                     
                STAP_RETVALUE = 0;                                              
        else                                                                    
                STAP_RETVALUE = 1;                                              
%}                                                                              
                                                                                
probe process.begin, process.end, process.syscall, process.syscall.return       
{                                                                               
        seen[pp()] = non_atomic_context()                                       
}                                                                               
                                                                                
probe end                                                                       
{                                                                               
        foreach (pp+ in seen) {                                                 
                printf("%s - %d\n", pp, seen[pp])                               
        }                                                                       
}                                                                               
====

On 3.12.0-0.rc7.git0.1.fc21.x86_64, I see the following, which is the correct output - all the probe handlers are run in non-atomic context.

====
# stap -g ../context.stp -c 'cat /dev/null'
process.begin - 1
process.end - 1
process.syscall - 1
process.syscall.return - 1
====

When I run the same script on 3.12.0-0.rc7.git1.1.fc21.i686+PAE, I see the following - which means all the probe handers are being run in atomic context:

====
stap -g ../context.stp -c 'cat /dev/null'
process.begin - 0
process.end - 0
process.syscall - 0
process.syscall.return - 0
====

Something odd is going on.

Comment 5 David Smith 2013-11-01 19:54:55 UTC
I've run that same script on the following kernels:

3.7.9-104.fc17.i686.PAE
3.11.4-101.fc18.i686.PAE
3.11.6-200.fc19.i686.PAE

and I get the correct answers. I'm wondering if 3.12.0-0.rc7.git1.1.fc21.i686+PAE is configured oddly.

Comment 6 David Smith 2013-11-06 13:46:15 UTC
Created attachment 820375 [details]
potential patch

Here's a patch I'm testing. It adds a 'task_work' handler to task death events if the tracepoint handler is in atomic context.

The good news is that I no longer get the "sleeping" bug when I run the perf.exp test. The bad news is that I'm seeing some instability in the utrace_p5.exp test. I'm going to do further testing to see if that instability was present without this patch.

I also don't know yet if we're getting that warning from other places since I haven't run the full testsuite yet.

Comment 7 Frank Ch. Eigler 2014-04-07 00:56:00 UTC
commit bca2a6fc / PR16129


Note You need to log in before you can comment on or make changes to this bug.