Bug 683097

Summary: lots of unknown NMIs type 31 and 21 with 2.6.32-119.el6
Product: Red Hat Enterprise Linux 6 Reporter: Nate Straz <nstraz>
Component: kernelAssignee: Frank Arnold <farnold>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.1CC: dzickus, peterm, ypu
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 674281 Environment:
Last Closed: 2011-03-17 21:26:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On: 674281    
Bug Blocks:    

Description Nate Straz 2011-03-08 15:00:30 UTC
I'm seeing this with a -119.el6 based kernel.  I'm working on git bisecting.

+++ This bug was initially created as a clone of Bug #674281 +++

Description of problem:
After upgrading my rawhide boxes to 2.6.38-rc kernels, dmesg is filled with following messages every few seconds"

[  491.782610] Uhhuh. NMI received for unknown reason 31 on CPU 1. 
[  491.782617] Do you have a strange power saving mode enabled? 
[  491.782622] Dazed and confused, but trying to continue 
[  541.524621] Uhhuh. NMI received for unknown reason 21 on CPU 0.

Type 21 and 31 appear. It wasn't the case with .37 kernels on those machines.
Hardware is pretty standard Dell Optiplex GX620, with pentium 4 CPU.
Smolt profile of one machine is here:
http://smolts.org/show?uuid=pub_7d62f4b4-f2f3-4c70-9a04-0aca1f270345

The NMI messages dissapear after issuing "echo 0 > /proc/sys/kernel/nmi_watchdog" (as suggested by powertop).

Version-Release number of selected component (if applicable):
2.6.38-0.rc2.git7.1.fc15.x86_64

How reproducible:
Boot computer.

--- Additional comment from zdzichu@irc.pl on 2011-02-16 08:50:54 EST ---

May be related to: https://lkml.org/lkml/2011/2/16/106

[tip:perf/urgent] perf, x86: P4 PMU: Fix spurious NMI messages

--- Additional comment from updates@fedoraproject.org on 2011-02-25 15:08:06 EST ---

Package kernel-2.6.38-0.rc6.git4.1.fc15:
* should fix your issue,
* was pushed to the Fedora 15 updates-testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.38-0.rc6.git4.1.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-2.6.38-0.rc6.git4.1.fc15
then log in and leave karma (feedback).

--- Additional comment from updates@fedoraproject.org on 2011-02-28 11:41:50 EST ---

Package kernel-2.6.38-0.rc6.git6.1.fc15:
* should fix your issue,
* was pushed to the Fedora 15 updates-testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing kernel-2.6.38-0.rc6.git6.1.fc15'
as soon as you are able to, then reboot.
Please go to the following url:
https://admin.fedoraproject.org/updates/kernel-2.6.38-0.rc6.git6.1.fc15
then log in and leave karma (feedback).

--- Additional comment from updates@fedoraproject.org on 2011-03-04 04:55:39 EST ---

kernel-2.6.38-0.rc6.git6.1.fc15 has been pushed to the Fedora 15 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 2 Nate Straz 2011-03-10 03:46:19 UTC
I finished my git bisect and it ended here:

Bisecting: 0 revisions left to test after this (roughly 0 steps)
[a83bd5ca7110d1739e6f268455369245216e6e21] [x86] perf: P4 PMU - Fix unflagged overflows handling

I was testing on a Dell PowerEdge 850 with a Intel(R) Celeron(R) CPU 2.53GHz.

Comment 4 Joy Pu 2011-03-14 05:27:14 UTC
Also find this problem with report unknown reason 00 in RHEL 6.1 guest with Kernel 2.6.32-118.el6, and the guest failed to boot up in our test:
2011-03-14 10:54:52: Uhhuh. NMI received for unknown reason 31 on CPU 0.
2011-03-14 10:54:52: Uhhuh. NMI received for unknown reason 00 on CPU 1.
2011-03-14 10:54:52: Do you have a strange power saving mode enabled?
2011-03-14 10:54:52: Dazed and confused, but trying to continue
2011-03-14 10:54:52: Uhhuh. NMI received for unknown reason 00 on CPU 1.
2011-03-14 10:54:52: Do you have a strange power saving mode enabled?
2011-03-14 10:54:52: Dazed and confused, but trying to continue
2011-03-14 10:54:52: Do you have a strange power saving mode enabled?
2011-03-14 10:54:52: Dazed and confused, but trying to continue
2011-03-14 10:54:52: Uhhuh. NMI received for unknown reason 21 on CPU 0.
2011-03-14 10:54:52: Do you have a strange power saving mode enabled?
2011-03-14 10:54:52: Dazed and confused, but trying to continue

Comment 5 Don Zickus 2011-03-17 21:26:02 UTC

*** This bug has been marked as a duplicate of bug 688547 ***