Bug 918889

Summary: NMI received for unknown reason
Product: [Fedora] Fedora Reporter: calvin <calvin>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: calvin, gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: i686   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-01 02:25:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
dmesg output
none
3.7.3-101.fc17.i686.PAE dmesg output
none
3.7.3-101.fc17.i686.PAE dmesg output @ 2238.094027 none

Description calvin 2013-03-07 07:18:12 UTC
Created attachment 706385 [details]
dmesg output

Description of problem:
The following messages show up and system becomes unresponsive and locks up for long periods of time. It eventually recovers at some point.

 kernel:[4423634.265647] Do you have a strange power saving mode enabled?
 kernel:[4423634.265647] Dazed and confused, but trying to continue
 kernel:[4423634.302053] Uhhuh. NMI received for unknown reason 21 on CPU 0.
 kernel:[4423634.302053] Do you have a strange power saving mode enabled?
 kernel:[4423634.302053] Dazed and confused, but trying to continue
 kernel:[4423634.350024] Uhhuh. NMI received for unknown reason 31 on CPU 0.

Version-Release number of selected component (if applicable):
3.6.11-1.fc17.i686.PAE

How reproducible:
Run an I/O intensive process. We have Hadoop (TaskTracker and DataNode) running on this machine and the OS locks up when running I/O intensive tasks.

Additional info:

Seems related to https://bugzilla.redhat.com/show_bug.cgi?id=688547.

Hardware info (2 processors):

vendor_id       : GenuineIntel
cpu family      : 15
model           : 2
model name      : Intel(R) Xeon(TM) CPU 3.06GHz
stepping        : 9
microcode       : 0x2d
cpu MHz         : 3066.590
cache size      : 512 KB

Comment 1 Josh Boyer 2013-03-07 12:35:38 UTC
Does this happen with the 3.7.x updates?

Comment 2 calvin 2013-03-07 15:41:13 UTC
This also happens with 3.7.3-101.fc17.i686.PAE. 

I've attached dmesg output from another machine with similar specs.

Comment 3 calvin 2013-03-07 15:41:47 UTC
Created attachment 706698 [details]
3.7.3-101.fc17.i686.PAE dmesg output

Comment 4 Josh Boyer 2013-03-07 15:51:51 UTC
Both of these attachments are well into things going bad.  The kernel is already tainted with W, which means the first thing that went wrong isn't captured here.  Do you happen to have logs that show the first time an error occurred?

Comment 5 calvin 2013-03-07 16:06:23 UTC
I do have logs for the first time this error happens a few hours prior, but has the W flag already on it:

[ 2680.100002] Pid: 14, comm: migration/1 Tainted: G        W    3.7.3-101.fc17.i686.PAE #1 Sun Microsystems     Sun Fire(tm) V60 370-6037               /SE7501WV2S

I've attached the dmesg output (I truncated the first 3300 lines which said the same thing over and over for some brevity).

Comment 6 calvin 2013-03-07 16:07:15 UTC
Created attachment 706711 [details]
3.7.3-101.fc17.i686.PAE dmesg output @ 2238.094027

Comment 7 Fedora End Of Life 2013-07-03 23:36:02 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 8 Fedora End Of Life 2013-08-01 02:25:39 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.