Bug 547762

Summary: PCI AER: HEST FIRMWARE FIRST support
Product: Red Hat Enterprise Linux 5 Reporter: Prarit Bhargava <prarit>
Component: kernelAssignee: Prarit Bhargava <prarit>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.5CC: charles_rose, cward, emcnabb, hjia, jfeeney, jwilson, martinez, matt_domsch, rlerch, sghosh, shyam_iyer
Target Milestone: rc   
Target Release: 5.5   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:24:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 496328    
Attachments:
Description Flags
RHEL5 fix for this issue [1/2]
none
RHEL5 fix for this issue [2/2] none

Description Prarit Bhargava 2009-12-15 16:21:46 UTC
Description of problem:

Dell has realized that PCI AER support is going to be in RHEL5.5.  Their new Dell PowerEdge 11G systems require HEST FIRMWARE FIRST support, which was added to the upstream kernel after the initial PCI AER snapshot.

Without this support, PCI AER errors reported on domains other than 0 would not be handled correctly on their hardware.

Additionally, add a PCI AER on/off switch for those users who may experience problems with PCI AER.

Version-Release number of selected component (if applicable): 2.6.18-180.el5

Comment 2 RHEL Program Management 2009-12-15 16:52:28 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Prarit Bhargava 2009-12-16 18:19:32 UTC
Upstream patch (for RHKL reviewers):

commit 0584396157ad2d008e2cc76b4ed6254151183a25
Author: Matt Domsch <Matt_Domsch>
Date:   Mon Nov 2 11:51:24 2009 -0600

    PCI: PCIe AER: honor ACPI HEST FIRMWARE FIRST mode

    Feedback from Hidetoshi Seto and Kenji Kaneshige incorporated.  This
    correctly handles PCI-X bridges, PCIe root ports and endpoints, and
    prints debug messages when invalid/reserved types are found in the
    HEST.  PCI devices not in domain/segment 0 are not represented in
    HEST, thus will be ignored.

    Today, the PCIe Advanced Error Reporting (AER) driver attaches itself
    to every PCIe root port for which BIOS reports it should, via ACPI
    _OSC.

    However, _OSC alone is insufficient for newer BIOSes.  Part of ACPI
    4.0 is the new APEI (ACPI Platform Error Interfaces) which is a way
    for OS and BIOS to handshake over which errors for which components
    each will handle.  One table in ACPI 4.0 is the Hardware Error Source
    Table (HEST), where BIOS can define that errors for certain PCIe
    devices (or all devices), should be handled by BIOS ("Firmware First
    mode"), rather than be handled by the OS.

    Dell PowerEdge 11G server BIOS defines Firmware First mode in HEST, so
    that it may manage such errors, log them to the System Event Log, and
    possibly take other actions.  The aer driver should honor this, and
    not attach itself to devices noted as such.

    Furthermore, Kenji Kaneshige reminded us to disallow changing the AER
    registers when respecting Firmware First mode.  Platform firmware is
    expected to manage these, and if changes to them are allowed, it could
    break that firmware's behavior.

    The HEST parsing code may be replaced in the future by a more
    feature-rich implementation.  This patch provides the minimum needed
    to prevent breakage until that implementation is available.

    Reviewed-by: Kenji Kaneshige <kaneshige.kenji.com>
    Reviewed-by: Hidetoshi Seto <seto.hidetoshi.com>
    Signed-off-by: Matt Domsch <Matt_Domsch>
    Signed-off-by: Jesse Barnes <jbarnes>

Comment 8 Prarit Bhargava 2009-12-16 18:43:03 UTC
Created attachment 378815 [details]
RHEL5 fix for this issue [1/2]

Comment 9 Prarit Bhargava 2009-12-16 18:43:33 UTC
Created attachment 378816 [details]
RHEL5 fix for this issue [2/2]

Comment 14 Marizol Martinez 2010-02-12 20:09:49 UTC
Partners -- Please wait for *Snapshot 1* to test this feature. Thanks!

Comment 17 errata-xmlrpc 2010-03-30 07:24:20 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html