Bug 1842866

Summary: [kernel-rt] BUG: scheduling while atomic: swapper/20/0/0x00000000
Product: Red Hat Enterprise Linux 8 Reporter: Qiao Zhao <qzhao>
Component: systemtapAssignee: Frank Ch. Eigler <fche>
systemtap sub component: system-version QA Contact: Martin Cermak <mcermak>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bhu, fche, jbastian, lberk, mcermak, mjw, mnewsome, mstowell, qzhao, smakarov, williams
Version: 8.3Keywords: Bugfix
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.3   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: systemtap-4.1-1.el8 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 03:59:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1842946    

Description Qiao Zhao 2020-06-02 09:19:16 UTC
Description of problem:

Run "/kernel/tracepoints/operational" case on Internal snapshot build kernel-rt-4.18.0-208.rt5.20.el8,
some call trace displays:

[  371.636172] CPU: 20 PID: 0 Comm: swapper/20 Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-208.rt5.20.el8.x86_64 #1 
[  371.636174] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 09/17/2019 
[  371.636176] Call Trace: 
[  371.636181]  <IRQ> 
[  371.636183] ------------[ cut here ]------------ 
[  371.636184] DEBUG_LOCKS_WARN_ON(val > preempt_count()) 
[  371.636191]  dump_stack+0x5c/0x80 
[  371.636195] WARNING: CPU: 13 PID: 0 at kernel/sched/core.c:3368 preempt_count_sub+0x5a/0x90 
[  371.636198]  ? start_secondary+0x6b/0x200 
[  371.636199] Modules linked in: 
[  371.636205]  __schedule_bug.cold.82+0x87/0x94 
[  371.636206]  stap_33223652d84959c48943fb67b12d7b2_48355(OE) 
[  371.636213]  __schedule+0x61f/0x850 
[  371.636214]  rpcsec_gss_krb5 
[  371.636217]  ? _raw_spin_lock+0x13/0x40 
[  371.636218]  auth_rpcgss 
[  371.636222]  schedule+0x39/0xd0 
[  371.636223]  nfsv4 dns_resolver nfs 
[  371.636228]  rt_spin_lock_slowlock_locked+0x10e/0x2b0 
[  371.636230]  lockd 
[  371.636233]  rt_spin_lock_slowlock+0x50/0x80 
[  371.636234]  grace fscache 
[  371.636245]  stp_print_flush+0x4a/0x220 [stap_33223652d84959c48943fb67b12d7b2_48355] 
[  371.636246]  sunrpc 
[  371.636253]  ? tick_sched_do_timer+0x70/0x70 
[  371.636254]  edac_mce_amd 
[  371.636260]  ? enter_real_tracepoint_probe_9+0x12f/0x260 [stap_33223652d84959c48943fb67b12d7b2_48355] 
[  371.636261]  ses 
[  371.636267]  ? raise_softirq_irqoff+0x8f/0xe0 
[  371.636268]  crct10dif_pclmul 
[  371.636271]  ? raise_softirq+0x1c/0x30 
[  371.636272]  enclosure 
[  371.636279]  ? update_process_times+0x1d/0x50 
[  371.636280]  crc32_pclmul 
[  371.636283]  ? tick_sched_handle+0x22/0x60 
[  371.636284]  ghash_clmulni_intel 
[  371.636288]  ? tick_sched_timer+0x43/0x80 
[  371.636289]  pcspkr 
[  371.636292]  ? __hrtimer_run_queues+0x11d/0x3b0 
[  371.636293]  ipmi_ssif 
[  371.636296]  ? hrtimer_interrupt+0x10a/0x220 
[  371.636297]  ipmi_si 
[  371.636300]  ? ktime_get+0x36/0xa0 
[  371.636301]  sp5100_tco 
[  371.636306]  ? smp_apic_timer_interrupt+0x9d/0x220 
[  371.636307]  ipmi_devintf 
[  371.636310]  ? apic_timer_interrupt+0xf/0x20 
[  371.636311]  acpi_cpufreq 
[  371.636315]  </IRQ> 
[  371.636315]  ccp hpwdt 
[  371.636323]  ? cpuidle_enter_state+0xc9/0x4a0 
[  371.636324]  hpilo 
[  371.636327]  ? cpuidle_enter_state+0xa6/0x4a0 
[  371.636328]  acpi_tad ipmi_msghandler 
[  371.636332]  ? cpuidle_enter+0x2c/0x40 
[  371.636333]  wmi 
[  371.636338]  ? do_idle+0x2d9/0x340 
[  371.636339]  acpi_power_meter k10temp 
[  371.636343]  ? cpu_startup_entry+0x46/0x50 
[  371.636345]  i2c_piix4 
[  371.636348]  ? start_secondary+0x1a8/0x200 
[  371.636349]  ip_tables 
[  371.636355]  ? secondary_startup_64+0xb7/0xc0 
[  371.636356]  xfs libcrc32c sd_mod sg crc32c_intel serio_raw nvme nvme_core mgag200 drm_vram_helper ttm i2c_algo_bit drm_kms_helper syscopyarea smartpqi sysfillrect sysimgblt scsi_transport_sas fb_sys_fops drm tg3 uas usb_storage dm_mirror dm_region_hash dm_log dm_mod [last unloaded: stap_c1d846b479eb29fd918e965a441f1a5_47502] 
[  371.636376] CPU: 13 PID: 0 Comm: swapper/13 Kdump: loaded Tainted: G        W  OE    --------- -  - 4.18.0-208.rt5.20.el8.x86_64 #1 
[  371.636377] Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 09/17/2019 
[  371.636380] RIP: 0010:preempt_count_sub+0x5a/0x90 
[  371.636384] Code: b1 93 72 c3 e8 c7 c8 34 00 85 c0 74 f6 8b 15 cd cb f2 01 85 d2 75 ec 48 c7 c6 5e ec 68 8e 48 c7 c7 93 69 67 8e e8 f0 33 fd ff <0f> 0b c3 84 d2 75 c9 e8 9a c8 34 00 85 c0 74 c9 8b 05 a0 cb f2 01 
[  371.636386] RSP: 0018:ffff907a9e083e38 EFLAGS: 00010082 
[  371.636388] RAX: 0000000000000000 RBX: ffffa3404064f000 RCX: 0000000000000001 
[  371.636391] RDX: 0000000000000001 RSI: ffffffff8f368eea RDI: 0000000000000046 
[  371.636393] RBP: ffff90729bcbb040 R08: ffffffff8f368ec0 R09: 000000000002bc00 
[  371.636393] R10: 00000101be2567d6 R11: 0000000000000000 R12: ffffa3404031fdc8 
[  371.636395] R13: ffffffff8d750c80 R14: 000000000000000f R15: ffff907a9e0ac440 
[  371.636396] FS:  0000000000000000(0000) GS:ffff907a9e080000(0000) knlGS:0000000000000000 
[  371.636397] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[  371.636399] CR2: 0000557ec6470870 CR3: 0000000fb680e000 CR4: 00000000003406e0 
[  371.636401] Call Trace: 
[  371.636402]  <IRQ> 
[  371.636406]  enter_real_tracepoint_probe_9+0x1aa/0x260 [stap_33223652d84959c48943fb67b12d7b2_48355] 
[  371.636410]  ? raise_softirq_irqoff+0x8f/0xe0 
[  371.636411]  ? raise_softirq+0x1c/0x30 
[  371.636412]  ? trigger_load_balance+0x3c/0x220 
[  371.636414]  ? tick_sched_do_timer+0x70/0x70 
[  371.636417]  ? update_process_times+0x47/0x50 
[  371.636418]  ? tick_sched_handle+0x22/0x60 
[  371.636420]  ? tick_sched_timer+0x43/0x80 
[  371.636422]  ? __hrtimer_run_queues+0x11d/0x3b0 
[  371.636425]  ? hrtimer_interrupt+0x10a/0x220 
[  371.636426]  ? ktim[  381.319050] BUG: scheduling while atomic: swapper/20/0/0x00000000 

Full log:
http://lab-04.rhts.eng.pek2.redhat.com/beaker/logs/recipes/8372+/8372858/console.log
Job: https://beaker.engineering.redhat.com/recipes/8372858#task111146558

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
Checked 8.2 GA kernel testing, no such call trace occurs.

Comment 3 Frank Ch. Eigler 2020-06-03 13:17:09 UTC
Thanks for the one-liner fix, will merge upstream and should make its way into 8.3.

Comment 6 Jeff Bastian 2020-06-26 17:58:07 UTC
*** Bug 1851258 has been marked as a duplicate of this bug. ***

Comment 13 errata-xmlrpc 2020-11-04 03:59:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (systemtap bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4801