Bug 1976610

Summary: NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
Product: Red Hat Enterprise Linux 8 Reporter: Zhang Yi <yizhan>
Component: kernelAssignee: Phil Auld <pauld>
kernel sub component: Scheduler QA Contact: Chunyu Hu <chuhu>
Status: CLOSED INSUFFICIENT_DATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: aquadri, aubaker, bhu, cye, dmontald, jaeshin, jwesterd, nsharma, pauld, spanjikk, vagrawal, xiliang
Version: 8.5Keywords: Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-25 18:45:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Zhang Yi 2021-06-27 14:26:59 UTC
Description of problem:
NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!

Version-Release number of selected component (if applicable):
4.18.0-318.el8.aarch64

How reproducible:
100%

Steps to Reproduce:
1. blktests: use_siw=1 ./check nvmeof-mp/001
2.
3.

Actual results:


Expected results:


Additional info:
Currently, I only reproduced this issue on 
ampere-hr330a-10.khw4.lab.eng.bos.redhat.com

beaker job: https://beaker.engineering.redhat.com/jobs/5509556

[  105.285571] run blktests nvmeof-mp/001 at 2021-06-27 10:12:52
[  105.384907] null_blk: module loaded
[  105.526537] TECH PREVIEW: Software iWARP Driver may not be fully supported.
               Please review provided documentation for limitations.
[  105.540786] SoftiWARP attached
[  105.682538] nvmet: adding nsid 1 to subsystem nvme-test
[  105.704233] iwpm_register_pid: Unable to send a nlmsg (client = 2)
[  105.711098] nvmet_rdma: enabling port 1 (10.19.241.178:7777)
[  105.786564] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: address resolved (0): status 0 id 000000002c455cfb
[  105.786862] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: route resolved  (2): status 0 id 000000002c455cfb
[  105.787462] nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: connect request (4): status 0 id 000000005dc82777
[  105.787469] nvmet_rdma:nvmet_rdma_find_get_device: nvmet_rdma: added enP2p1s0_siw.
[  105.787777] nvmet_rdma:nvmet_rdma_create_queue_ib: nvmet_rdma: nvmet_rdma_create_queue_ib: max_cqe= 4096 max_sge= 6 sq_size = 129 cm_id= 000000005dc82777
[  105.881576] nvmet_rdma:nvmet_rdma_cm_handler: nvmet_rdma: established (9): status 0 id 000000005dc82777
[  105.881587] nvme_rdma:nvme_rdma_cm_handler: nvme nvme0: established (9): status 0 id 000000002c455cfb
[  105.881892] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.889924] nvmet:nvmet_start_keep_alive_timer: nvmet: ctrl 1 start keep-alive timer for 15 secs
[  105.889928] nvmet: creating controller 1 for subsystem nvme-test for NQN nqn.2014-08.org.nvmexpress:uuid:76f99c99-aff6-4be5-aac8-947b549ab692.
[  105.902793] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.910783] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.918843] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.926833] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.934882] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.942871] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.950915] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.958904] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  105.966971] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #20!!!
[  106.001489] nvme nvme0: creating 32 I/O queues.

Comment 1 Frank Liang 2021-07-21 08:07:26 UTC
Found similar error in aws testing(RHEL-9.0.0-20210720.1). It occurs after cpuhotplug test, but not reproduce each time.

INFO:error found!
INFO:This is a new exception!
INFO:[Wed Jul 21 00:32:22 2021] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!

Comment 7 John Westerdale 2022-05-05 13:45:17 UTC
Found on fedora 34 as well.. just before a TBT3 dock nic drops off.  T480s running Fedora34. 

It's interleaved in /var/log/messages, was adjacent to Dock-Nic drop off (PCIe -> TBT3 -> USB NIC), Dock NIC might be a red herring though.

Linux jwesterd-f34 5.17.5-100.fc34.x86_64 #1 SMP PREEMPT Thu Apr 28 16:02:54 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux

[51734.006279] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
[51749.240662] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
[51759.400461] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
[51803.105090] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
[51809.003556] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
[51889.035919] NOHZ tick-stop error: Non-RCU local softirq work is pending, handler #08!!!
[51895.770343] usb 8-1: USB disconnect, device number 2

can add more if it might help.