Bug 185762 - Problems with EDAC module during first boot
Problems with EDAC module during first boot
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i686 Linux
medium Severity medium
: ---
: ---
Assigned To: Alan Cox
Brian Brock
:
: 174891 183352 (view as bug list)
Depends On:
Blocks: 198694 200936
  Show dependency treegraph
 
Reported: 2006-03-17 13:30 EST by Linda Wang
Modified: 2010-10-22 00:40 EDT (History)
10 users (show)

See Also:
Fixed In Version: RHBA-2007-0304
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-05-07 20:47:04 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Upstream fix (1.57 KB, patch)
2006-05-15 13:17 EDT, Alan Cox
no flags Details | Diff
Patch from upstream 2.6.17 rebased for 2.6.9-36.1 (1.54 KB, patch)
2006-06-20 15:48 EDT, Gary Case
no flags Details | Diff

  None (edit)
Comment 1 Jason Baron 2006-03-17 13:39:06 EST
if we're using this bug to track EDAC issues also see bug 182137 comment 17
Comment 2 Alan Cox 2006-04-24 09:10:53 EDT
If this is an AMI BIOS please raise the issue with AMI and Intel as according to
their linux-kernel posting in that case (which looks like your report a lot)
this is a BIOS interaction problem (they hide devices under us arbitarily on an
SMI occurance). Intel indicate they will be working with BIOS vendors on the
general issue. Until then disabling EDAC and not having any EDAC support on the
platform is the only immediate safe option.

Comment 3 Alan Cox 2006-05-15 12:59:01 EDT
*** Bug 183352 has been marked as a duplicate of this bug. ***
Comment 4 Alan Cox 2006-05-15 13:05:09 EDT
*** Bug 174891 has been marked as a duplicate of this bug. ***
Comment 5 Alan Cox 2006-05-15 13:15:09 EDT
I've foulded all these bugs together as they all get triggered by the same
underlying issue where the BIOS SMI code steals the device from us and hides it.
I'll attach the proposed (and upstream) fix in a moment, basically if the BIOS
has hidden the device we don't unhide it but tell the user to go chat to their
BIOS vendor.

Comment 6 Alan Cox 2006-05-15 13:17:14 EDT
Created attachment 129099 [details]
Upstream fix
Comment 13 Gary Case 2006-06-20 15:49:00 EDT
Created attachment 131220 [details]
Patch from upstream 2.6.17 rebased for 2.6.9-36.1
Comment 18 Jay Turner 2006-08-25 14:16:59 EDT
QE ack for 4.5.
Comment 19 RHEL Product and Program Management 2006-09-07 15:26:45 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 20 RHEL Product and Program Management 2006-09-07 15:26:49 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 21 RHEL Product and Program Management 2006-09-07 15:27:03 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 22 Jason Baron 2006-09-21 20:33:43 EDT
committed in stream U5 build 42.13. A test kernel with this patch is available
from http://people.redhat.com/~jbaron/rhel4/
Comment 29 Aristeu Rozanski 2006-10-12 14:23:09 EDT
Gary, the Mark Gross' answer didn't get into BZ#, I only noticed it now by
accessing Issue Tracker, sorry about that.
Please try to load the edac_mc module with panic_on_ue=0 option(either by
specifying it when loading or by adding module option on modutils configuration)
and please paste the complete dmesg output here.
Thanks
Comment 31 Aristeu Rozanski 2006-10-13 11:45:10 EDT
According to comment #26, I checked the RPMs and that option is there, so the
patch appears to be correctly applied. The use of this option will avoid the
machine panic so we can have the complete dmesg.
Comment 32 Aristeu Rozanski 2006-10-13 11:51:05 EDT
To make my last comment clear: the use of panic_on_ue (on edac_mc module) option
is needed so we can get all kernel messages to check what's happening. The
force_function_unhide option is the one added by the patch (which comment #26
asserts to be on the module e752x_edac on Jason's kernel).
Comment 34 Aristeu Rozanski 2006-10-16 09:10:26 EDT
(In reply to comment #33)
(...)
> running the test with modprobe.conf option line:
> options e7552x_edac fouce_function_unhide=1 panic_on_ue=0
> results in no messages and no crashes.  (looking at edac_mc.c it looks like
> there isn't any messages that will get logged.
Please notice that "panic_on_ue" option is a edac_mc module option

> I looked in the /proc/mc
> directory but didn't find any inodes.
known problem, I'm working on it
Comment 37 Alan Cox 2006-11-14 10:53:55 EST
force_unhide should not be set. If the problem only occurs when force_unhide is
set this is a BIOS bug and the kernel change is not needed.
Comment 38 Red Hat Bugzilla 2007-02-08 14:42:32 EST
Bug report changed to ON_QA status by Errata System.
A QE request has been submitted for advisory RHBA-2007:9073-03.
http://errata.devel.redhat.com/errata/showrequest.cgi?advisory=4730
Comment 40 Red Hat Bugzilla 2007-05-07 20:47:04 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2007-0304.html

Note You need to log in before you can comment on or make changes to this bug.