Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 798382

Summary: error parsing HEST for firmware_first
Product: Red Hat Enterprise Linux 5 Reporter: Stuart Hayes <stuart_hayes>
Component: kernelAssignee: Lenny Szubowicz <lszubowi>
Status: CLOSED WONTFIX QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 5.8CC: shiyer, wwlinuxengineering
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-12-06 14:57:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
proposed patch none

Description Stuart Hayes 2012-02-28 18:51:17 UTC
Description of problem:

When a PCI deivce is setup in pci_setup_device, one of the things it does is set pdev->aer_firmware_first to 1 if the ACPI HEST table indicates that the system firmware is going to take care of errors on that device.

The function that actually parses the HEST is drivers/acpi/hest.c acpi_hest_firmware_first(), which loops through each HEST entry and checks it.  However, this function is failing to update a pointer each loop, so it ends up thinking that each entry is the same type of HEST entry as the first one, so the table is not parsed correctly (excpet for the first entry...).

This is causing some error reporting registers to get enabled when they shouldn't.


Version-Release number of selected component (if applicable):

RHEL 5.7 -- 2.6.18-308.el5 kernel


How reproducible:

every time, if you have a setup that is susceptible to the issue


Steps to Reproduce:
1. install a pci card that's behind a bridge (I am using a qlogic QLE2462 fibre channel card)... notice that BIOS sets the device control register (offset 8 in the pci express capability structure) set to 0x4814 (correctable, non-fatal, and unsupported request error reporting are all disabled)
2. boot into rhel5.7
3. use lspci -vvv (or -xxx) to see that this register was changed to 0x481f (because the qla2xxx driver calls pci_enable_pcie_error_reporting(), and aer_firmware_first for this device was 0)
  
Actual results:
the device control register (offset 8 in the pci express capability structure for the qlogic card) is changed from 0x4814 to 0x481f when the qla2xxx driver loads

Expected results:
the device control register should be left at 0x4814

Additional info:
i have a trivial patch to fix this... i will attach it to this bug

Comment 1 Stuart Hayes 2012-02-28 19:13:21 UTC
Created attachment 566382 [details]
proposed patch

Here's a patch that I've tested.

Comment 2 Lenny Szubowicz 2013-12-06 14:57:47 UTC
This Bugzilla has been reviewed by Red Hat and is not planned on being
addressed in Red Hat Enterprise Linux 5, and therefore is being closed.

If this bug is critical to production systems, please contact your Red
Hat support representative and provide a sufficient business
justification in order to re-open it.

                               -Lenny.