Bug 484943 - [Stratus 5.4 bug] PCI hot unplug can leak MSI descriptors causing fallback to legacy interrupts
[Stratus 5.4 bug] PCI hot unplug can leak MSI descriptors causing fallback to...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.3
All Linux
high Severity high
: beta
: 5.4
Assigned To: Jim Paradis
Red Hat Kernel QE team
: OtherQA
Depends On:
Blocks: 459515 483701 485920
  Show dependency treegraph
 
Reported: 2009-02-10 14:41 EST by Robert Manchek
Modified: 2009-09-02 04:26 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 04:26:48 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Backport of fix from 2.6.21 kernel. (1.31 KB, patch)
2009-02-10 14:41 EST, Robert Manchek
no flags Details | Diff

  None (edit)
Description Robert Manchek 2009-02-10 14:41:36 EST
Created attachment 331460 [details]
Backport of fix from 2.6.21 kernel.

Description of problem:

Hot-unplugging a PCI device which has MSI or MSI-X enabled can leak msi descriptors in the kernel because cleanup is not done properly.

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. Bring up PCI device which uses MSI or MSI-X
2. Surprise break hardware (electrically disconnect PCI device)
3. Do pci_remove_bus_device
  
Actual results:

After a few cycles, on probe device will fall back to using legacy interrupts.

Expected results:

Loop should run forever with no change to device operation.

Additional info:

This happens because pci_disable_msi, pci_disable_msix read device config space (which is no longer available) before cleaning up, and bail out if MSI capability not present.

A fix for this was introduced between 2.6.20 -> 2.6.21.
Comment 1 RHEL Product and Program Management 2009-02-16 10:05:50 EST
Updating PM score.
Comment 2 Jim Paradis 2009-02-17 10:53:30 EST
Patch posted to rhkernel-list
Comment 3 Don Zickus 2009-03-04 15:02:06 EST
in kernel-2.6.18-133.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 5 Chris Ward 2009-06-14 19:20:35 EDT
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~

RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!
Comment 6 Robert Manchek 2009-06-30 16:45:08 EDT
Tested this patch as part of 2.6.18-152.el5 alpha kernel.

The diff looks functionally identical to our original patch.  Also ran for ten device removal cycles and everything looks ok.
Comment 8 errata-xmlrpc 2009-09-02 04:26:48 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.