Description of problem: When booting RHEL5.5 I got these messages on specific host: testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (62->62)! The host is hp-dl385g7-01.rhts.eng.bos.redhat.com. And here is more test results http://rhts.redhat.com/cgi-bin/rhts/jobs.cgi?id=174249&type=single Version-Release number of selected component (if applicable): kernel-2.6.18-194.15.1.el5 How reproducible: always Steps to Reproduce: 1. Boot rhel5 on host hp-dl385g7-01.rhts.eng.bos.redhat.com. 2. Check dmesg 3. Actual results: AMD Opteron(tm) Processor 6128 stepping 01 Brought up 16 CPUs testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (62->62)! time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer. time.c: Detected 1424.953 MHz processor. Expected results: No such warning Additional info: bug 500892 is a similar one for rhel4
System with the same issues, I get the following when the server boots up. AMD Opteron(tm) Processor 6174 stepping 01 Brought up 24 CPUs testing NMI watchdog ... <4>WARNING: CPU#0: NMI appears to be stuck (177->177)! time.c: Using 14.318180 MHz WALL HPET GTOD HPET/TSC timer. time.c: Detected 2200.011 MHz processor. Information: RHEL 5.4 kernel 2.6.18-164.el5 HP ProLiant BL465c G7
This maybe due to the BIOS using the same performance counters the nmi watchdog is using. HP has suggested the following to disable some monitoring to allow the nmi watchdog to work. [This only affects AMD G7s AFAIK] (when the BIOS loads during a restart) - Press "F9" during POST to go into RBSU - Hit "control-a" - you will then see a new "service options" menu - go into it, and disable the following: 1) memory pre-failure notification 2) processor power utilization monitoring If this works, I will dup this bug over to another bug I am working on to address this issue. Cheers, Don
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Patch(es) available in kernel-2.6.18-252.el5 You can download this test kernel (or newer) from http://people.redhat.com/jwilson/el5 Detailed testing feedback is always welcomed.
Verified with 2.6.18-256.el5PAE.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-1065.html