Bug 1958724

Summary: NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
Product: Red Hat Enterprise Linux 9 Reporter: Qiao Zhao <qzhao>
Component: kernelAssignee: Jerome Marchand <jmarchan>
kernel sub component: Ftrace QA Contact: Qiao Zhao <qzhao>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: crwood, williams
Version: 9.0Keywords: Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-08-20 08:28:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Qiao Zhao 2021-05-10 02:33:00 UTC
Description of problem:

Running ftrace tier test trigger this panic: (case ftrace-stress-test is the last one, but not sure this case cause the panic)

[  266.096450] LTP: starting ftrace_regression01 (ftrace_regression01.sh)
[  266.382835] LTP: starting ftrace_regression02 (ftrace_regression02.sh)
[  266.837985] LTP: starting ftrace-stress-test (ftrace_stress_test.sh 90)
[  267.146863] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
[  269.529412] nop_test_accept flag set to 0: we accept. Now cat trace_options to see the result
[  271.732463] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
[  274.233153] nop_test_accept flag set to 0: we accept. Now cat trace_options to see the result
[  277.084742] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
[  280.866461] nop_test_accept flag set to 0: we accept. Now cat trace_options to see the result
[  280.866530] Ring buffer clock went backwards: 283531425012 -> 280866529715
[  286.673401] nop_test_accept flag set to 1: we accept. Now cat trace_options to see the result
[  288.423084] nop_test_accept flag set to 0: we accept. Now cat trace_options to see the result
[  298.596095] NMI watchdog: Watchdog detected hard LOCKUP on cpu 4
[  298.596099] Modules linked in: rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace nfs_ssc fscache rfkill sunrpc intel_rapl_msr intel_rapl_common vfat fat sb_edac x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm iTCO_wdt iTCO_vendor_support dcdbas irqbypass rapl intel_cstate intel_uncore ipmi_ssif pcspkr mgag200 i2c_algo_bit drm_kms_helper cdc_ether mxm_wmi usbnet syscopyarea sysfillrect sysimgblt fb_sys_fops cec mii lpc_ich mei_me mei ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter drm fuse ip_tables xfs sr_mod cdrom sd_mod t10_pi ahci libahci crct10dif_pclmul bnx2x crc32_pclmul libata ghash_clmulni_intel tg3 megaraid_sas mdio libcrc32c crc32c_intel wmi dm_mirror dm_region_hash dm_log dm_mod
[  298.596187] CPU: 4 PID: 49987 Comm: sh Kdump: loaded Tainted: G               X --------- ---  5.12.0-1.el9.x86_64 #1
[  298.596189] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 2.11.0 11/02/2019
[  298.596190] RIP: 0010:native_queued_spin_lock_slowpath.part.0+0x42/0x190
[  298.596191] Code: 0f ba 2a 08 0f 92 c1 8b 02 0f b6 c9 c1 e1 08 30 e4 09 c8 a9 00 01 ff ff 0f 85 ed 00 00 00 85 c0 74 0e 8b 02 84 c0 74 08 f3 90 <8b> 02 84 c0 75 f8 b8 01 00 00 00 66 89 02 c3 8b 37 b8 00 02 00 00
[  298.596193] RSP: 0018:ffff95824795fab0 EFLAGS: 00000002
[  298.596196] RAX: 0000000000300101 RBX: 0000000000000002 RCX: 0000000000000000
[  298.596197] RDX: ffffffff9c8932c8 RSI: 0000000000000000 RDI: ffffffff9c8932c8
[  298.596198] RBP: ffffffff9c8932c8 R08: 0000004375628667 R09: ffff8928c126a000
[  298.596200] R10: 8000000000000063 R11: ffff8928c1240081 R12: 0000004375628667
[  298.596201] R13: 00000000000003e8 R14: 0000000000000f54 R15: 00000000000003d5
[  298.596202] FS:  00007f556b6e4740(0000) GS:ffff892de7c80000(0000) knlGS:0000000000000000
[  298.596204] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  298.596205] CR2: 00005587bf3e88d0 CR3: 0000000685fb8005 CR4: 00000000001706e0
[  298.596206] Call Trace:
[  298.596207]  trace_clock_global+0x75/0x80
[  298.596208]  __rb_reserve_next+0x1f0/0x440
[  298.596209]  ring_buffer_lock_reserve+0x11d/0x340
[  298.596211]  trace_buffer_lock_reserve+0x10/0x50
[  298.596212]  __trace_graph_return+0x1f/0x80
[  298.596213]  trace_graph_return+0x8f/0xd0
[  298.596214]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596215]  ftrace_return_to_handler+0x8b/0xf0
[  298.596216]  ? flush_ldt+0x30/0x30
[  298.596217]  ? tick_nohz_handler+0x1/0x90
[  298.596218]  return_to_handler+0x15/0x30
[  298.596220]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596221]  ? __text_poke+0x2d4/0x440
[  298.596222]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596223]  ? text_poke_bp_batch+0x112/0x1f0
[  298.596224]  ? __x64_sys_setreuid16+0x30/0x30
[  298.596225]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596227]  ? text_poke_flush+0x44/0x50
[  298.596228]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596229]  ? text_poke_queue+0x26/0x70
[  298.596230]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596231]  ? ftrace_replace_code+0x117/0x170
[  298.596232]  ? ftrace_modify_all_code+0xb8/0x150
[  298.596233]  ? cache_mod+0x290/0x290
[  298.596234]  ? ftrace_run_update_code+0x13/0x70
[  298.596236]  ? ftrace_shutdown.part.0+0x106/0x1a0
[  298.596237]  ? unregister_ftrace_function+0x35/0x60
[  298.596238]  ? stack_trace_sysctl+0x8b/0x90
[  298.596239]  ? proc_sys_call_handler+0x141/0x260
[  298.596240]  ? new_sync_write+0x11c/0x1b0
[  298.596241]  ? vfs_write+0x1b8/0x270
[  298.596242]  ? ksys_write+0x5f/0xe0
[  298.596243]  ? do_syscall_64+0x33/0x40
[  298.596245]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae
[  298.596246] Kernel panic - not syncing: Hard LOCKUP
[  298.596247] CPU: 4 PID: 49987 Comm: sh Kdump: loaded Tainted: G               X --------- ---  5.12.0-1.el9.x86_64 #1
[  298.596249] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 2.11.0 11/02/2019
[  298.596250] Call Trace:
[  298.596251]  <NMI>
[  298.596252]  dump_stack+0x64/0x7c
[  298.596253]  panic+0xfb/0x2d7
[  298.596255]  nmi_panic.cold+0xc/0xc
[  298.596256]  watchdog_overflow_callback.cold+0x5c/0x70
[  298.596257]  __perf_event_overflow+0x52/0xf0
[  298.596258]  handle_pmi_common+0x200/0x310
[  298.596259]  ? do_kernel_range_flush+0x30/0x30
[  298.596260]  ? prepare_ftrace_return+0x5c/0x80
[  298.596262]  ? intel_pmu_handle_irq+0xf3/0x210
[  298.596263]  ? icl_set_topdown_event_period+0x100/0x100
[  298.596264]  ? function_graph_enter+0xe7/0x140
[  298.596265]  ? icl_set_topdown_event_period+0x100/0x100
[  298.596266]  ? prepare_ftrace_return+0x5c/0x80
[  298.596267]  ? ftrace_graph_caller+0x6b/0xa0
[  298.596268]  ? intel_pmu_reset+0x1c0/0x1c0
[  298.596269]  ? prepare_ftrace_return+0x5c/0x80
[  298.596271]  ? ftrace_graph_caller+0x6b/0xa0
[  298.596272]  intel_pmu_handle_irq+0xf3/0x210
[  298.596273]  perf_event_nmi_handler+0x28/0x50
[  298.596274]  nmi_handle+0x58/0x100
[  298.596275]  default_do_nmi+0x42/0x130
[  298.596276]  exc_nmi+0x11f/0x150
[  298.596277]  end_repeat_nmi+0x16/0x55
[  298.596278] RIP: 0010:native_queued_spin_lock_slowpath.part.0+0x42/0x190
[  298.596280] Code: 0f ba 2a 08 0f 92 c1 8b 02 0f b6 c9 c1 e1 08 30 e4 09 c8 a9 00 01 ff ff 0f 85 ed 00 00 00 85 c0 74 0e 8b 02 84 c0 74 08 f3 90 <8b> 02 84 c0 75 f8 b8 01 00 00 00 66 89 02 c3 8b 37 b8 00 02 00 00
[  298.596281] RSP: 0018:ffff95824795fab0 EFLAGS: 00000002
[  298.596283] RAX: 0000000000300101 RBX: 0000000000000002 RCX: 0000000000000000
[  298.596285] RDX: ffffffff9c8932c8 RSI: 0000000000000000 RDI: ffffffff9c8932c8
[  298.596286] RBP: ffffffff9c8932c8 R08: 0000004375628667 R09: ffff8928c126a000
[  298.596287] R10: 8000000000000063 R11: ffff8928c1240081 R12: 0000004375628667
[  298.596288] R13: 00000000000003e8 R14: 0000000000000f54 R15: 00000000000003d5
[  298.596290]  ? native_queued_spin_lock_slowpath.part.0+0x42/0x190
[  298.596291]  ? native_queued_spin_lock_slowpath.part.0+0x42/0x190
[  298.596292]  </NMI>
[  298.596293]  trace_clock_global+0x75/0x80
[  298.596294]  __rb_reserve_next+0x1f0/0x440
[  298.596295]  ring_buffer_lock_reserve+0x11d/0x340
[  298.596296]  trace_buffer_lock_reserve+0x10/0x50
[  298.596297]  __trace_graph_return+0x1f/0x80
[  298.596299]  trace_graph_return+0x8f/0xd0
[  298.596300]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596301]  ftrace_return_to_handler+0x8b/0xf0
[  298.596302]  ? flush_ldt+0x30/0x30
[  298.596303]  ? tick_nohz_handler+0x1/0x90
[  298.596304]  return_to_handler+0x15/0x30
[  298.596306]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596307]  ? __text_poke+0x2d4/0x440
[  298.596308]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596309]  ? text_poke_bp_batch+0x112/0x1f0
[  298.596310]  ? __x64_sys_setreuid16+0x30/0x30
[  298.596312]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596313]  ? text_poke_flush+0x44/0x50
[  298.596314]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596315]  ? text_poke_queue+0x26/0x70
[  298.596316]  ? ftrace_graph_caller+0xa0/0xa0
[  298.596317]  ? ftrace_replace_code+0x117/0x170
[  298.596318]  ? ftrace_modify_all_code+0xb8/0x150
[  298.596320]  ? cache_mod+0x290/0x290
[  298.596321]  ? ftrace_run_update_code+0x13/0x70
[  298.596322]  ? ftrace_shutdown.part.0+0x106/0x1a0
[  298.596323]  ? unregister_ftrace_function+0x35/0x60
[  298.596324]  ? stack_trace_sysctl+0x8b/0x90
[  298.596325]  ? proc_sys_call_handler+0x141/0x260
[  298.596326]  ? new_sync_write+0x11c/0x1b0
[  298.596327]  ? vfs_write+0x1b8/0x270
[  298.596328]  ? ksys_write+0x5f/0xe0
[  298.596330]  ? do_syscall_64+0x33/0x40
[  298.596331]  ? entry_SYSCALL_64_after_hwframe+0x44/0xae


Host -> https://beaker.engineering.redhat.com/view/dell-per730-01.dell2.lab.eng.bos.redhat.com
Job  -> https://beaker.engineering.redhat.com/jobs/5351554
nvr  -> 5.12.0-1.el9.x86_64

:  vmcore files:
:      http://ibm-x3250m4-03.rhts.eng.pek2.redhat.com/vmcore/qzhao/5.12.0-1.el9.x86_64/5351554/dell-per730-01.dell2.lab.eng.bos.redhat.com/10.16.218.153-2021-05-08-08:57:45/vmcore
:      http://ibm-x3250m4-03.rhts.eng.pek2.redhat.com/vmcore/qzhao/5.12.0-1.el9.x86_64/5351554/dell-per730-01.dell2.lab.eng.bos.redhat.com/10.16.218.153-2021-05-08-08:57:45/vmcore-dmesg.txt

Version-Release number of selected component (if applicable):


How reproducible:
once

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info: