Red Hat Bugzilla – Bug 463872
[LTC 6.0 FEAT] 201264:EDAC Support
Last modified: 2010-10-18 15:17:09 EDT
=Comment: #0================================================= Emily J. Ratliff <emilyr@us.ibm.com> - 2008-09-24 13:53 EDT 1. Feature Overview: Feature Id: [201264] a. Name of Feature: EDAC Support b. Feature Description RAS Enhancements Require EDAC support for all chipsets in IBM systems (including proprietary). Also would prefer the ability to disable EDAC drivers during install for platforms that don't require them. The majority of IBM Intel and AMD based servers do memory and CPU predictive failure analysis (PFA) in the BIOS with the help of the BMC. When EDAC drivers are loaded they will poll for memory errors once a second. The EDAC drivers may read the same hardware status registers that the BIOS is using for PFA. This can easily lead to interference between EDAC and the BIOS if EDAC reads and clears the registers before the BIOS gets a chance to do so and can potentially render the BIOS's PFA mechanism dysfunctional. Therefore, a mechanism (black/white listing or perhaps something more robust) for disabling EDAC on platforms that already do PFA in firmware would be ideal. 2. Feature Details: Sponsor: xSeries Architectures: x86 x86_64 Arch Specificity: Both Affects Installer: Yes Affects Kernel Modules: Yes Delivery Mechanism: Request Red Hat development assistance Category: Kernel Request Type: Kernel - Enhancement from Upstream d. Upstream Acceptance: In Progress Sponsor Priority 1 f. Severity: High IBM Confidential: no Code Contribution: IBM code g. Component Version Target: The EDAC code for some IBM chipsets has been started. The target at this point is probably 2.6.27. Additionally, there are components to this feature request that require code from the distros (ability to disable/enable EDAC based on the platform type). 3. Business Case Need to provide a mechanism for customers to be able to detect and report platform errors from Linux. 4. Primary contact at Red Hat: John Jarvis jjarvis@redhat.com 5. Primary contacts at Partner: Project Management Contact: Monte Knutson, mknutson@us.ibm.com, 877-894-1495 Technical contact(s): Kevin Stansell, kstansel@us.ibm.com Chris McDermott, mcdermoc@us.ibm.com IBM Manager: Deneen T. Dock, deneen@us.ibm.com
If you want to do disabling of the EDAC code based on whether or not the firmware in the platform supports other methods, this should really be done via DMI matching in the kernel code itself.
(In reply to comment #4) > ------- Comment From notting@redhat.com 2008-10-03 13:29:02 EDT------- > If you want to do disabling of the EDAC code based on whether or not the > firmware in the platform supports other methods, this should really be done via > DMI matching in the kernel code itself. > Yes, agreed. However, this is slightly more complicated than just DMI matching. Since the BIOS setup can provide an option for disabling PFAs, there needs to be a way to dynamically determine whether or not the BIOS is _currently_ handling PFA (through SMIs, typically). There are potentially race conditions that can occur if both BIOS and Linux are handling errors simultaneously.
It's unclear to me where this code should be - it's not as if userspace would have any better idea what BIOS option has been set. Can this be read from SMBIOS or similar?
Chris, can you please help with an answer to Bill's question in comment 3?
Max has been looking at this issue. I'll have him respond.
Is this bug still active? I just had some partners in APAC ask about it.
Yes, this BZ is still active.
Assigning this to Peter Bogdanovic at IBM.
IBM System x has ceased further EDAC driver development.
------- Comment From sglass@us.ibm.com 2009-12-08 20:34 EDT------- This was quit in devtrack so doing the same here.