Bug 484943 - [Stratus 5.4 bug] PCI hot unplug can leak MSI descriptors causing fallback to legacy interrupts
Summary: [Stratus 5.4 bug] PCI hot unplug can leak MSI descriptors causing fallback to...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.3
Hardware: All
OS: Linux
high
high
Target Milestone: beta
: 5.4
Assignee: Jim Paradis
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 459515 483701 485920
TreeView+ depends on / blocked
 
Reported: 2009-02-10 19:41 UTC by Robert Manchek
Modified: 2009-09-02 08:26 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-09-02 08:26:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Backport of fix from 2.6.21 kernel. (1.31 KB, patch)
2009-02-10 19:41 UTC, Robert Manchek
no flags Details | Diff


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2009:1243 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.4 kernel security and bug fix update 2009-09-01 08:53:34 UTC

Description Robert Manchek 2009-02-10 19:41:36 UTC
Created attachment 331460 [details]
Backport of fix from 2.6.21 kernel.

Description of problem:

Hot-unplugging a PCI device which has MSI or MSI-X enabled can leak msi descriptors in the kernel because cleanup is not done properly.

Version-Release number of selected component (if applicable):


How reproducible:

100%

Steps to Reproduce:
1. Bring up PCI device which uses MSI or MSI-X
2. Surprise break hardware (electrically disconnect PCI device)
3. Do pci_remove_bus_device
  
Actual results:

After a few cycles, on probe device will fall back to using legacy interrupts.

Expected results:

Loop should run forever with no change to device operation.

Additional info:

This happens because pci_disable_msi, pci_disable_msix read device config space (which is no longer available) before cleaning up, and bail out if MSI capability not present.

A fix for this was introduced between 2.6.20 -> 2.6.21.

Comment 1 RHEL Program Management 2009-02-16 15:05:50 UTC
Updating PM score.

Comment 2 Jim Paradis 2009-02-17 15:53:30 UTC
Patch posted to rhkernel-list

Comment 3 Don Zickus 2009-03-04 20:02:06 UTC
in kernel-2.6.18-133.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.

Comment 5 Chris Ward 2009-06-14 23:20:35 UTC
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~

RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!

Comment 6 Robert Manchek 2009-06-30 20:45:08 UTC
Tested this patch as part of 2.6.18-152.el5 alpha kernel.

The diff looks functionally identical to our original patch.  Also ran for ten device removal cycles and everything looks ok.

Comment 8 errata-xmlrpc 2009-09-02 08:26:48 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html


Note You need to log in before you can comment on or make changes to this bug.