Bug 688711

Summary: Receiving periodic swarms of 'kernel:Uhhuh. NMI received for unknown reason 31|00|21 on CPU N.
Product: Red Hat Enterprise Linux 6 Reporter: Barry Marson <bmarson>
Component: kernelAssignee: Don Zickus <dzickus>
Status: CLOSED DUPLICATE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 6.1CC: arozansk, higkoohk, prarit
Target Milestone: rc   
Target Release: 6.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-03-18 15:11:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
cpuinfo file from bigi system none

Description Barry Marson 2011-03-17 19:24:02 UTC
Created attachment 486086 [details]
cpuinfo file from bigi system

Description of problem:
The bigi testbed which runs SPECsfs NFS worload is receiving messages of the form ...

Message from syslogd@bigi at Mar 16 17:17:21 ...
 kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.

Message from syslogd@bigi at Mar 16 17:17:21 ...
 kernel:Do you have a strange power saving mode enabled?

Message from syslogd@bigi at Mar 16 17:17:21 ...
 kernel:Dazed and confused, but trying to continue

Message from syslogd@bigi at Mar 16 17:17:21 ...
 kernel:Uhhuh. NMI received for unknown reason 00 on CPU 1.

Message from syslogd@bigi at Mar 16 17:17:21 ...
 kernel:Do you have a strange power saving mode enabled?

Message from syslogd@bigi at Mar 16 17:17:21 ...
 kernel:Dazed and confused, but trying to continue

...

in bursts with the 2.6.32-122 kernel.  Tests with the -118 kernel show no problem.  The cpuinfo for this server, an HP DL580g2 (Intel) has been attached.  I have tried disabling hyper threads at the BIOS but we still get errors.  Disabling nmi_watchdog in /proc/sys/kernel makes them go away.

I have talked to dzickus about this ...


Version-Release number of selected component (if applicable):
2.6.32-122

How reproducible:
everytime

Steps to Reproduce:
1. run specsfs on bigi testbed for sure
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 2 Prarit Bhargava 2011-03-18 12:38:37 UTC
bmarson, we need more logs than what you have above.

P.

Comment 3 Don Zickus 2011-03-18 15:11:58 UTC
Barry,

I thought you were the first one to notice this yesterday, turns out you are already the third bz filed for it. :-)

I hate p4 boxes.

*** This bug has been marked as a duplicate of bug 688547 ***

Comment 4 higkoo 2013-11-13 05:28:18 UTC
I had see
kernel:Uhhuh. NMI received for unknown reason 31 on CPU 0.
kernel:Uhhuh. NMI received for unknown reason 21 on CPU 0.
kernel:Do you have a strange power saving mode enabled?
kernel:Dazed and confused, but trying to continue

I had try :
add ‘nmi_watchdog=0 pcie_aspm=off nohpet’ to kernel param
change a older kernel

Result:
Use a older kernel 2.6.32-131.21.1 (default is 2.6.32-358.23.2)。