Description of problem: Dell has realized that PCI AER support is going to be in RHEL5.5. Their new Dell PowerEdge 11G systems require HEST FIRMWARE FIRST support, which was added to the upstream kernel after the initial PCI AER snapshot. Without this support, PCI AER errors reported on domains other than 0 would not be handled correctly on their hardware. Additionally, add a PCI AER on/off switch for those users who may experience problems with PCI AER. Version-Release number of selected component (if applicable): 2.6.18-180.el5
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Upstream patch (for RHKL reviewers): commit 0584396157ad2d008e2cc76b4ed6254151183a25 Author: Matt Domsch <Matt_Domsch> Date: Mon Nov 2 11:51:24 2009 -0600 PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated. This correctly handles PCI-X bridges, PCIe root ports and endpoints, and prints debug messages when invalid/reserved types are found in the HEST. PCI devices not in domain/segment 0 are not represented in HEST, thus will be ignored. Today, the PCIe Advanced Error Reporting (AER) driver attaches itself to every PCIe root port for which BIOS reports it should, via ACPI _OSC. However, _OSC alone is insufficient for newer BIOSes. Part of ACPI 4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way for OS and BIOS to handshake over which errors for which components each will handle. One table in ACPI 4.0 is the Hardware Error Source Table (HEST), where BIOS can define that errors for certain PCIe devices (or all devices), should be handled by BIOS ("Firmware First mode"), rather than be handled by the OS. Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so that it may manage such errors, log them to the System Event Log, and possibly take other actions. The aer driver should honor this, and not attach itself to devices noted as such. Furthermore, Kenji Kaneshige reminded us to disallow changing the AER registers when respecting Firmware First mode. Platform firmware is expected to manage these, and if changes to them are allowed, it could break that firmware's behavior. The HEST parsing code may be replaced in the future by a more feature-rich implementation. This patch provides the minimum needed to prevent breakage until that implementation is available. Reviewed-by: Kenji Kaneshige <kaneshige.kenji.com> Reviewed-by: Hidetoshi Seto <seto.hidetoshi.com> Signed-off-by: Matt Domsch <Matt_Domsch> Signed-off-by: Jesse Barnes <jbarnes>
Created attachment 378815 [details] RHEL5 fix for this issue [1/2]
Created attachment 378816 [details] RHEL5 fix for this issue [2/2]
Partners -- Please wait for *Snapshot 1* to test this feature. Thanks!
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2010-0178.html