Bug 1044732

Summary: mcelogd fails to start at boot
Product: Red Hat Enterprise Linux 6 Reporter: David Jones <david.jones74>
Component: mcelogAssignee: Prarit Bhargava <prarit>
Status: CLOSED NOTABUG QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 6.5   
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-19 00:17:56 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description David Jones 2013-12-18 21:33:51 UTC
Description of problem:
mcelogd fails to start at boot, with message:
"AMD Processor family 16: Please load edac_mce_amd module

Version-Release number of selected component (if applicable):


How reproducible:
Reboot system. Always happens.


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
I have another server running RHEL 6.2, with the exact same hardware, and mcelogd starts with no problems. This one's running 6.5. I checked lsmod, and the module is loaded.

Comment 1 Prarit Bhargava 2013-12-19 00:17:56 UTC
>AMD Processor family 16: Please load edac_mce_amd module

David,

The above message is correct.  In order to get the "best" ECC error information on AMD systems you should be using the edac_mce_amd module and not mcelogd.

P.

Comment 2 David Jones 2013-12-19 14:41:11 UTC
Does that mean I should disable the service? 

I believe this is still a bug. The error message gives the impression that the service depends on the module being loaded. 

Why is mcelog enabled by default, instead of edac, on a system that doesn't support it? This is very misleading. 

Until now, I knew nothing about either of these services, and I spent a lot of time searching for information without really finding anything. 

And if the two are incompatible, why are they both running on my CentOS 6.2 server?

$ edac-ctl --status
edac-ctl: drivers are loaded

$ service mcelogd status
/dev/mcelog not active
Checking for mcelog
mcelog is running

But on 6.5:

$edac-ctl --status
edac-ctl: drviers not loaded

$lsmod | grep edac
edac_core      46581   0
edac_mce_amd   14705   0

So it appears that the modules are loaded. The edac-ctl manpage shows a load option, but:

$edac-ctl --load
Unknown option: load