Bug 633196 - testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (62->62)!
Summary: testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (62->62)!
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.5.z
Hardware: All
OS: Linux
low
medium
Target Milestone: rc
: ---
Assignee: Don Zickus
QA Contact: Han Pingtian
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-09-13 08:11 UTC by Eryu Guan
Modified: 2018-11-14 15:51 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2011-07-21 10:25:05 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:1065 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.7 kernel security and bug fix update 2011-07-21 09:21:37 UTC

Description Eryu Guan 2010-09-13 08:11:29 UTC
Description of problem:
When booting RHEL5.5 I got these messages on specific host:
testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (62->62)!

The host is hp-dl385g7-01.rhts.eng.bos.redhat.com.

And here is more test results
http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=174249&type=single


Version-Release number of selected component (if applicable):
kernel-2.6.18-194.15.1.el5

How reproducible:
always

Steps to Reproduce:
1. Boot rhel5 on host hp-dl385g7-01.rhts.eng.bos.redhat.com.
2. Check dmesg
3.
  
Actual results:
AMD Opteron(tm) Processor 6128 stepping 01
Brought up 16 CPUs
testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (62->62)!
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 1424.953 MHz processor.

Expected results:
No such warning

Additional info:
bug 500892 is a similar one for rhel4

Comment 1 Jacob Hunt 2010-12-24 00:03:53 UTC
System with the same issues, I get the following when the server boots up.

AMD Opteron(tm) Processor 6174 stepping 01
Brought up 24 CPUs
testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (177->177)!
time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer.
time.c: Detected 2200.011 MHz processor.


Information:

RHEL 5.4
kernel 2.6.18-164.el5
HP ProLiant BL465c G7

Comment 2 Don Zickus 2011-01-03 14:25:06 UTC
This maybe due to the BIOS using the same performance counters the nmi watchdog is using.  HP has suggested the following to disable some monitoring to allow the nmi watchdog to work.  [This only affects AMD G7s AFAIK]

(when the BIOS loads during a restart)
- Press "F9" during POST to go into RBSU
- Hit "control-a"
- you will then see a new "service options" menu
- go into it, and disable the following:
1) memory pre-failure notification
2) processor power utilization monitoring

If this works, I will dup this bug over to another bug I am working on to address this issue.

Cheers,
Don

Comment 7 RHEL Program Management 2011-03-21 22:49:23 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 9 Jarod Wilson 2011-03-28 18:37:17 UTC
Patch(es) available in kernel-2.6.18-252.el5
You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5
Detailed testing feedback is always welcomed.

Comment 11 Han Pingtian 2011-05-16 03:31:08 UTC
Verified with 2.6.18-256.el5PAE.

Comment 12 errata-xmlrpc 2011-07-21 10:25:05 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-1065.html


Note You need to log in before you can comment on or make changes to this bug.