Red Hat Bugzilla – Bug 484943
[Stratus 5.4 bug] PCI hot unplug can leak MSI descriptors causing fallback to legacy interrupts
Last modified: 2009-09-02 04:26:48 EDT
Created attachment 331460 [details]
Backport of fix from 2.6.21 kernel.
Description of problem:
Hot-unplugging a PCI device which has MSI or MSI-X enabled can leak msi descriptors in the kernel because cleanup is not done properly.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Bring up PCI device which uses MSI or MSI-X
2. Surprise break hardware (electrically disconnect PCI device)
3. Do pci_remove_bus_device
After a few cycles, on probe device will fall back to using legacy interrupts.
Loop should run forever with no change to device operation.
This happens because pci_disable_msi, pci_disable_msix read device config space (which is no longer available) before cleaning up, and bail out if MSI capability not present.
A fix for this was introduced between 2.6.20 -> 2.6.21.
Updating PM score.
Patch posted to rhkernel-list
You can download this test kernel from http://people.redhat.com/dzickus/el5
Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so. However feel free
to provide a comment indicating that this fix has been verified.
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~
RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!
If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!
Tested this patch as part of 2.6.18-152.el5 alpha kernel.
The diff looks functionally identical to our original patch. Also ran for ten device removal cycles and everything looks ok.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.