Bug 1671126

Summary: NMI watchdog ineffective due to mismerge
Product: Red Hat Enterprise Linux 7 Reporter: Crystal Wood <crwood>
Component: kernel-rtAssignee: Crystal Wood <crwood>
kernel-rt sub component: Other QA Contact: Tiefu <tieli>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: unspecified CC: bhu, crwood, lgoncalv, qzhao, williams
Version: 7.7   
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 08.rt56.9kernel-rt-3.10.0-1066.el7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-08-06 12:36:27 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1655694    

Description Crystal Wood 2019-01-30 19:51:13 UTC
There is an extra "return" added by merge commit 7be8efa0a0bacd464 in watchdog_overflow_callback() that prevents an NMI watchdog-detected lockup from ever being reported.

Comment 9 Tiefu 2019-05-14 10:56:18 UTC
[Tiefu Li 14 May 2019]
Following the principal "does anything break" testing, I can't see any different behaviour between the problematic version and the fixed version.
Please refer the below steps:
1. Ensure that testing environment contains the bug:
[root@hp-dl380eg8-01 ~]# uname -r
3.10.0-976.rt56.930.el7.x86_64
2. Ensure that the feature is on:
[root@hp-dl380eg8-01 ~]# cat /proc/sys/kernel/nmi_watchdog
1-> Means this is on.
3. Observe the interrupts:
[root@hp-dl380eg8-01 ~]# grep NMI /proc/interrupts
 NMI:          9          6          6          6          6          7          7          7          9          7          7          7        174          6          6          6          6          6          6          7          8          6          6          6   Non-maskable interrupts

Secondly,I installed the fix on my server again then test again
Below are my step by step procedure:
1. Installation:
[root@hp-dl380eg8-01 ~]# wget http://download.eng.pek2.redhat.com/brewroot/packages/kernel-rt/3.10.0/1010.rt56.968.el7/x86_64/kernel-rt-3.10.0-1010.rt56.968.el7.x86_64.rpm
[root@hp-dl380eg8-01 ~]# rpm -ihv kernel-rt-3.10.0-1010.rt56.968.el7.x86_64.rpm
[root@hp-dl380eg8-01 ~]#  grubby --default-kernel
/boot/vmlinuz-3.10.0-1010.rt56.968.el7.x86_64
[root@hp-dl380eg8-01 ~]# rhts-reboot
Connection to hp-dl380eg8-01.rhts.eng.pek2.redhat.com closed by remote host.

2. After reboot the server:
[root@hp-dl380eg8-01 ~]# uname -r
3.10.0-1010.rt56.968.el7.x86_64
[root@hp-dl380eg8-01 ~]# cat /proc/sys/kernel/nmi_watchdog
1
[root@hp-dl380eg8-01 ~]# grep NMI /proc/interrupts
 NMI:         10          7          7          7          7          8          8          8         10          8          8          8        209          7          7          7          7          7          7          8          9          7          7          7   Non-maskable interrupts

Comment 11 errata-xmlrpc 2019-08-06 12:36:27 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:2043