Bug 770931

Summary: [Hardware Error]: Machine check events logged
Product: [Fedora] Fedora Reporter: Eddie Lania <eddie>
Component: kernelAssignee: Kernel Maintainer List <kernel-maint>
Status: CLOSED NOTABUG QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 16CC: gansalmon, itamar, jonathan, kernel-maint, madhu.chinakonda
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-30 22:05:06 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
Overheating MCE warings none

Description Eddie Lania 2011-12-30 10:34:46 UTC
Description of problem: During boot the following message is in the messages log:

[Hardware Error]: Machine check events logged


Version-Release number of selected component (if applicable):

kernel-3.1.6-1.fc16.x86_64

How reproducible: At every boot


Steps to Reproduce:
1. User kernel-3.1.6-1.fc16.x86_64
2. Boot system
3. Observe /var/log/messages
  
Actual results: Kernel Errors Present


Expected results: No Kernel Errors Present


Additional info:


http://www.smolts.org/client/show/?uuid=pub_ddab803e-5d9c-4cd6-8854-b8e37853031a

Comment 1 Dave Jones 2011-12-30 19:38:28 UTC
unless there's indication that some driver is causing the machine checks, there's nothing we can do here.  As the message says, most of the time, these are hardware problems of some kind.

Comment 2 Eddie Lania 2011-12-30 20:53:16 UTC
Created attachment 550094 [details]
Overheating MCE warings

Actually, there has been an issue where a process ran that caused the cpu to get really warm but that was solved. The messages in the attachment were given in the messages log.

Could it be possible that there is some information stored somewhere about this error and it needs to be cleared to solve the machine error at start-up?

Like with the IML on HP servers?

Comment 3 Dave Jones 2011-12-30 22:01:26 UTC
no, machine checks get logged as they occur, so this isn't replaying old events.

as the message states, this is very likely some cooling problem with the hardware, and not something we can fix or work around in software.