Red Hat Bugzilla – Bug 185762
Problems with EDAC module during first boot
Last modified: 2018-10-19 16:43:36 EDT
if we're using this bug to track EDAC issues also see bug 182137 comment 17
If this is an AMI BIOS please raise the issue with AMI and Intel as according to
their linux-kernel posting in that case (which looks like your report a lot)
this is a BIOS interaction problem (they hide devices under us arbitarily on an
SMI occurance). Intel indicate they will be working with BIOS vendors on the
general issue. Until then disabling EDAC and not having any EDAC support on the
platform is the only immediate safe option.
*** Bug 183352 has been marked as a duplicate of this bug. ***
*** Bug 174891 has been marked as a duplicate of this bug. ***
I've foulded all these bugs together as they all get triggered by the same
underlying issue where the BIOS SMI code steals the device from us and hides it.
I'll attach the proposed (and upstream) fix in a moment, basically if the BIOS
has hidden the device we don't unhide it but tell the user to go chat to their
Created attachment 129099 [details]
Created attachment 131220 [details]
Patch from upstream 2.6.17 rebased for 2.6.9-36.1
QE ack for 4.5.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
committed in stream U5 build 42.13. A test kernel with this patch is available
Gary, the Mark Gross' answer didn't get into BZ#, I only noticed it now by
accessing Issue Tracker, sorry about that.
Please try to load the edac_mc module with panic_on_ue=0 option(either by
specifying it when loading or by adding module option on modutils configuration)
and please paste the complete dmesg output here.
According to comment #26, I checked the RPMs and that option is there, so the
patch appears to be correctly applied. The use of this option will avoid the
machine panic so we can have the complete dmesg.
To make my last comment clear: the use of panic_on_ue (on edac_mc module) option
is needed so we can get all kernel messages to check what's happening. The
force_function_unhide option is the one added by the patch (which comment #26
asserts to be on the module e752x_edac on Jason's kernel).
(In reply to comment #33)
> running the test with modprobe.conf option line:
> options e7552x_edac fouce_function_unhide=1 panic_on_ue=0
> results in no messages and no crashes. (looking at edac_mc.c it looks like
> there isn't any messages that will get logged.
Please notice that "panic_on_ue" option is a edac_mc module option
> I looked in the /proc/mc
> directory but didn't find any inodes.
known problem, I'm working on it
force_unhide should not be set. If the problem only occurs when force_unhide is
set this is a BIOS bug and the kernel change is not needed.
Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2007:9073-03.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.