Created attachment 331460 [details] Backport of fix from 2.6.21 kernel. Description of problem: Hot-unplugging a PCI device which has MSI or MSI-X enabled can leak msi descriptors in the kernel because cleanup is not done properly. Version-Release number of selected component (if applicable): How reproducible: 100% Steps to Reproduce: 1. Bring up PCI device which uses MSI or MSI-X 2. Surprise break hardware (electrically disconnect PCI device) 3. Do pci_remove_bus_device Actual results: After a few cycles, on probe device will fall back to using legacy interrupts. Expected results: Loop should run forever with no change to device operation. Additional info: This happens because pci_disable_msi, pci_disable_msix read device config space (which is no longer available) before cleaning up, and bail out if MSI capability not present. A fix for this was introduced between 2.6.20 -> 2.6.21.
Updating PM score.
Patch posted to rhkernel-list
in kernel-2.6.18-133.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~ RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner! If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!
Tested this patch as part of 2.6.18-152.el5 alpha kernel. The diff looks functionally identical to our original patch. Also ran for ten device removal cycles and everything looks ok.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html