Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
For bugs related to Red Hat Enterprise Linux 5 product line. The current stable release is 5.10. For Red Hat Enterprise Linux 6 and above, please visit Red Hat JIRA https://issues.redhat.com/secure/CreateIssue!default.jspa?pid=12332745 to report new issues.

Bug 484943

Summary: [Stratus 5.4 bug] PCI hot unplug can leak MSI descriptors causing fallback to legacy interrupts
Product: Red Hat Enterprise Linux 5 Reporter: Robert Manchek <robert.manchek>
Component: kernelAssignee: Jim Paradis <jparadis>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: andriusb, peterm
Target Milestone: betaKeywords: OtherQA
Target Release: 5.4   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-09-02 08:26:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 459515, 483701, 485920    
Attachments:
Description Flags
Backport of fix from 2.6.21 kernel. none

Description Robert Manchek 2009-02-10 19:41:36 UTC
Created attachment 331460 [details]
Backport of fix from 2.6.21 kernel.

Description of problem:

Hot-unplugging a PCI device which has MSI or MSI-X enabled can leak msi descriptors in the kernel because cleanup is not done properly.

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. Bring up PCI device which uses MSI or MSI-X
2. Surprise break hardware (electrically disconnect PCI device)
3. Do pci_remove_bus_device
  
Actual results:

After a few cycles, on probe device will fall back to using legacy interrupts.

Expected results:

Loop should run forever with no change to device operation.

Additional info:

This happens because pci_disable_msi, pci_disable_msix read device config space (which is no longer available) before cleaning up, and bail out if MSI capability not present.

A fix for this was introduced between 2.6.20 -> 2.6.21.

Comment 1 RHEL Program Management 2009-02-16 15:05:50 UTC
Updating PM score.

Comment 2 Jim Paradis 2009-02-17 15:53:30 UTC
Patch posted to rhkernel-list

Comment 3 Don Zickus 2009-03-04 20:02:06 UTC
in kernel-2.6.18-133.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 5 Chris Ward 2009-06-14 23:20:35 UTC
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~

RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!

Comment 6 Robert Manchek 2009-06-30 20:45:08 UTC
Tested this patch as part of 2.6.18-152.el5 alpha kernel.

The diff looks functionally identical to our original patch.  Also ran for ten device removal cycles and everything looks ok.

Comment 8 errata-xmlrpc 2009-09-02 08:26:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html