Note: This bug is displayed in read-only format because
the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
=Comment: #0=================================================
Emily J. Ratliff <emilyr.com> - 2008-09-24 13:53 EDT
1. Feature Overview:
Feature Id: [201264]
a. Name of Feature: EDAC Support
b. Feature Description
RAS Enhancements Require EDAC support for all chipsets in IBM systems (including proprietary).
Also would prefer the ability to disable EDAC drivers during install for platforms that don't
require them. The majority of IBM Intel and AMD based servers do memory and CPU predictive
failure analysis (PFA) in the BIOS with the help of the BMC. When EDAC drivers are loaded they will
poll for memory errors once a second. The EDAC drivers may read the same hardware status registers
that the BIOS is using for PFA. This can easily lead to interference between EDAC and the BIOS if
EDAC reads and clears the registers before the BIOS gets a chance to do so and can potentially
render the BIOS's PFA mechanism dysfunctional. Therefore, a mechanism (black/white listing or
perhaps something more robust) for disabling EDAC on platforms that already do PFA in firmware would
be ideal.
2. Feature Details:
Sponsor: xSeries
Architectures:
x86
x86_64
Arch Specificity: Both
Affects Installer: Yes
Affects Kernel Modules: Yes
Delivery Mechanism: Request Red Hat development assistance
Category: Kernel
Request Type: Kernel - Enhancement from Upstream
d. Upstream Acceptance: In Progress
Sponsor Priority 1
f. Severity: High
IBM Confidential: no
Code Contribution: IBM code
g. Component Version Target: The EDAC code for some IBM chipsets has been started. The target at
this point is probably 2.6.27. Additionally, there are components to this feature request that
require code from the distros (ability to disable/enable EDAC based on the platform type).
3. Business Case
Need to provide a mechanism for customers to be able to detect and report platform errors from Linux.
4. Primary contact at Red Hat:
John Jarvis
jjarvis
5. Primary contacts at Partner:
Project Management Contact:
Monte Knutson, mknutson.com, 877-894-1495
Technical contact(s):
Kevin Stansell, kstansel.com
Chris McDermott, mcdermoc.com
IBM Manager:
Deneen T. Dock, deneen.com
If you want to do disabling of the EDAC code based on whether or not the firmware in the platform supports other methods, this should really be done via DMI matching in the kernel code itself.
(In reply to comment #4)
> ------- Comment From notting 2008-10-03 13:29:02 EDT-------
> If you want to do disabling of the EDAC code based on whether or not the
> firmware in the platform supports other methods, this should really be done via
> DMI matching in the kernel code itself.
>
Yes, agreed. However, this is slightly more complicated than just DMI matching. Since the BIOS setup can provide an option for disabling PFAs, there needs to be a way to dynamically determine whether or not the BIOS is _currently_ handling PFA (through SMIs, typically). There are potentially race conditions that can occur if both BIOS and Linux are handling errors simultaneously.
It's unclear to me where this code should be - it's not as if userspace would have any better idea what BIOS option has been set. Can this be read from SMBIOS or similar?