Bug 185762
| Summary: | Problems with EDAC module during first boot | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 4 | Reporter: | Linda Wang <lwang> | ||||||
| Component: | kernel | Assignee: | Alan Cox <alan> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Brian Brock <bbrock> | ||||||
| Severity: | medium | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | 4.0 | CC: | arozansk, jbaron, jburke, jfeeney, jturner, mark.gross, ppokorny, rhentosh, tburke, wwlinuxengineering | ||||||
| Target Milestone: | --- | ||||||||
| Target Release: | --- | ||||||||
| Hardware: | i686 | ||||||||
| OS: | Linux | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | RHBA-2007-0304 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2007-05-08 00:47:04 UTC | Type: | --- | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 198694, 200936 | ||||||||
| Attachments: | 
 | ||||||||
| 
        
          Comment 1
        
        
          Jason Baron
        
        
        
        
        
          2006-03-17 18:39:06 UTC
        
       If this is an AMI BIOS please raise the issue with AMI and Intel as according to their linux-kernel posting in that case (which looks like your report a lot) this is a BIOS interaction problem (they hide devices under us arbitarily on an SMI occurance). Intel indicate they will be working with BIOS vendors on the general issue. Until then disabling EDAC and not having any EDAC support on the platform is the only immediate safe option. *** Bug 183352 has been marked as a duplicate of this bug. *** *** Bug 174891 has been marked as a duplicate of this bug. *** I've foulded all these bugs together as they all get triggered by the same underlying issue where the BIOS SMI code steals the device from us and hides it. I'll attach the proposed (and upstream) fix in a moment, basically if the BIOS has hidden the device we don't unhide it but tell the user to go chat to their BIOS vendor. Created attachment 129099 [details]
Upstream fix
Created attachment 131220 [details]
Patch from upstream 2.6.17 rebased for 2.6.9-36.1
QE ack for 4.5. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. committed in stream U5 build 42.13. A test kernel with this patch is available from http://people.redhat.com/~jbaron/rhel4/ Gary, the Mark Gross' answer didn't get into BZ#, I only noticed it now by accessing Issue Tracker, sorry about that. Please try to load the edac_mc module with panic_on_ue=0 option(either by specifying it when loading or by adding module option on modutils configuration) and please paste the complete dmesg output here. Thanks According to comment #26, I checked the RPMs and that option is there, so the patch appears to be correctly applied. The use of this option will avoid the machine panic so we can have the complete dmesg. To make my last comment clear: the use of panic_on_ue (on edac_mc module) option is needed so we can get all kernel messages to check what's happening. The force_function_unhide option is the one added by the patch (which comment #26 asserts to be on the module e752x_edac on Jason's kernel). (In reply to comment #33) (...) > running the test with modprobe.conf option line: > options e7552x_edac fouce_function_unhide=1 panic_on_ue=0 > results in no messages and no crashes. (looking at edac_mc.c it looks like > there isn't any messages that will get logged. Please notice that "panic_on_ue" option is a edac_mc module option > I looked in the /proc/mc > directory but didn't find any inodes. known problem, I'm working on it force_unhide should not be set. If the problem only occurs when force_unhide is set this is a BIOS bug and the kernel change is not needed. Bug report changed to ON_QA status by Errata System. A QE request has been submitted for advisory RHBA-2007:9073-03. http://errata.devel.redhat.com/errata/showrequest.cgi?advisory=4730 An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2007-0304.html |