Bug 475696 - [LTC 5.4 FEAT] EEH infrastructure change for MSI-X interrupt support [201779]
[LTC 5.4 FEAT] EEH infrastructure change for MSI-X interrupt support [201779]
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.4
ppc64 All
high Severity high
: alpha
: 5.4
Assigned To: Ameet Paranjape
Red Hat Kernel QE team
: FutureFeature, OtherQA
Depends On:
Blocks: 445204 483701 483784 485920
  Show dependency treegraph
 
Reported: 2008-12-09 21:22 EST by IBM Bug Proxy
Modified: 2009-09-02 04:16 EDT (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2009-09-02 04:16:02 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Support EEH recovery for devices using MSI-X (2.76 KB, text/plain)
2009-01-26 19:21 EST, IBM Bug Proxy
no flags Details
Don't disable MSI and MSI-X when EEH errors occurs (3.12 KB, text/plain)
2009-02-12 17:01 EST, IBM Bug Proxy
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
IBM Linux Technology Center 50534 None None None Never

  None (edit)
Description IBM Bug Proxy 2008-12-09 21:22:02 EST
=Comment: #0=================================================
Emily J. Ratliff <ratliff@austin.ibm.com> - 
1. Feature Overview:
Feature Id:	[201779]
a. Name of Feature:	EEH infrastructure change for MSI-X interrupt support
b. Feature Description
Update the kernel EEH handling code in order to support the system IO error recovery adequately for
devices running with MSI-X interrupts, which is going to be enabled on Power systems by 4Q/2009.

2. Feature Details:
Sponsor:	PPC
Architectures:
ppc64

Arch Specificity: Purely Common Code
Affects Core Kernel: Yes
Delivery Mechanism: Backport
Category:	Infrastructure
Request Type:	Kernel - Enhancement from IBM
d. Upstream Acceptance:	Accepted
Sponsor Priority	1
f. Severity: High
IBM Confidential:	no
Code Contribution:	IBM code
g. Component Version Target:	2.6.28

3. Business Case
Enhance the RAS support required for Power platform IO.

4. Primary contact at Red Hat: 
John Jarvis
jjarvis@redhat.com

5. Primary contacts at Partner:
Project Management Contact:
Mike Wortman, wortman@us.ibm.com, 512-838-8582

Technical contact(s):
Daisy Chang, daisyc@us.ibm.com
Michael Mason, masonmik@us.ibm.com

IBM Manager:
Larry Kessler, lkessler@us.ibm.com
Comment 1 John Jarvis 2008-12-18 15:01:14 EST
IBM is signed up to test and provide feedback
Comment 2 IBM Bug Proxy 2009-01-26 19:21:05 EST
Created attachment 330049 [details]
Support EEH recovery for devices using MSI-X



This patch *should* allow devices using MSI-X to recover from an EEH error.  It restores the MSI-X registers during the recovery process.  I have successfully built and booted kernels with this patch on ppc64 and x86-64, but have not been able to test it yet because the general MSI-X support is not working properly.  I will test once we have a ppc64 kernel where MSI-X is working.  My understanding is that Michael Ellerman from IBM Ozlabs will be submitting patches via Bugzilla to fix MSI-X.
Comment 3 John Jarvis 2009-01-27 10:06:49 EST
This feature does not appear to be using the generic fix for eeh restore all registers implemented in RHEL 5.3 in https://bugzilla.redhat.com/show_bug.cgi?id=470580 .  Any reason for that?  This code was included to prevent having to implement this feature for each driver.
Comment 4 IBM Bug Proxy 2009-01-28 14:30:50 EST
(In reply to comment #8)
> This feature does not appear to be using the generic fix for eeh restore all
> registers implemented in RHEL 5.3 in
> https://bugzilla.redhat.com/show_bug.cgi?id=470580 .  Any reason for that?
> This code was included to prevent having to implement this feature for each
> driver.
>

I don't see how this code breaks the generic fix for eeh restore, but regardless of that, the patch may not be necessary.  Once I have a kernel that supports msix on power, I'll test EEH without my patch to see if it works.


Marking as TESTED so I can get this bug in the SUBMITTED state.  The patch hasn't actually been tested other than to make sure it builds and boots.  I'll do real testing once I have a kernel that supports msix on power.
Comment 5 IBM Bug Proxy 2009-02-12 17:01:01 EST
Created attachment 331769 [details]
Don&apos;t disable MSI and MSI-X when EEH errors occurs



Turns out the previous patch was unnecessary for MSI-X support.  However, we discovered that interrupt disable/enable was not done correctly for MSI-X.    In fact, it isn't necessary to disable/enable MSI and MSI-X interrupts during EEH recovery.  MSI and MSI-X interrupts are effectively disabled by the DMA Stopped state when an EEH error occurs.  This patch ensures only LSI interrupts are disabled/enabled.

This patch has been submitted upstream to the linuxppc-dev mailing list.  It has received favorable reviews.  It can be referenced here:  http://ozlabs.org/pipermail/linuxppc-dev/2009-February/068177.html.
Comment 6 RHEL Product and Program Management 2009-02-16 10:20:27 EST
Updating PM score.
Comment 9 John Jarvis 2009-03-02 12:54:14 EST
This enhancement request was evaluated by the full Red Hat Enterprise Linux 
team for inclusion in a Red Hat Enterprise Linux minor release.   As a 
result of this evaluation, Red Hat has tentatively approved inclusion of 
this feature in the next Red Hat Enterprise Linux Update minor release.   
While it is a goal to include this enhancement in the next minor release 
of Red Hat Enterprise Linux, the enhancement is not yet committed for 
inclusion in the next minor release pending the next phase of actual 
code integration and successful Red Hat and partner testing.
Comment 10 IBM Bug Proxy 2009-03-04 12:01:00 EST
(In reply to comment #17)
> RHKML post:
>
> http://post-office.corp.redhat.com/archives/rhkernel-list/2009-February/msg00488.html
>

Please include the contents of this RHKML message.  I don't have access to RHKML.


(In reply to comment #18)
> This enhancement request was evaluated by the full Red Hat Enterprise Linux
> team for inclusion in a Red Hat Enterprise Linux minor release.   As a
> result of this evaluation, Red Hat has tentatively approved inclusion of
> this feature in the next Red Hat Enterprise Linux Update minor release.
> While it is a goal to include this enhancement in the next minor release
> of Red Hat Enterprise Linux, the enhancement is not yet committed for
> inclusion in the next minor release pending the next phase of actual
> code integration and successful Red Hat and partner testing.
>

Does this mean the patch *is* or *isn't* planned for inclusion in 5.4?
Comment 11 Don Zickus 2009-03-04 15:00:36 EST
in kernel-2.6.18-133.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please do NOT transition this bugzilla state to VERIFIED until our QE team
has sent specific instructions indicating when to do so.  However feel free
to provide a comment indicating that this fix has been verified.
Comment 13 IBM Bug Proxy 2009-04-13 12:01:04 EDT
------- Comment From masonmik@us.ibm.com 2009-04-13 11:55 EDT-------
The patch for this bug is in the rhel 5.4 kotd, but the underlying msi-x support patches from bug 51274 are not.  The patch in this bug cannot be tested until the 51274 patches are included.
Comment 14 John Jarvis 2009-04-13 12:29:19 EDT
51274 refers to an IBM Bugzilla, what is the corresponding Red Hat Bugzilla?
Comment 15 IBM Bug Proxy 2009-04-13 12:40:55 EDT
------- Comment From masonmik@us.ibm.com 2009-04-13 12:39 EDT-------
(In reply to comment #25)
> 51274 refers to an IBM Bugzilla, what is the corresponding Red Hat Bugzilla?
>

RIT279700
Comment 16 John Jarvis 2009-04-13 12:54:36 EDT
That is an Issue Tracker number which maps to RH BZ https://bugzilla.redhat.com/show_bug.cgi?id=492580.  Features (which this is one) should never be requested through Issue Tracker but should come directly through Bugzilla.  In the future please work with Emily Ratliff to get these included in the LTC's list of feature requests.  It is rare that I check Issue Tracker since it is not used for feature requests but I happened to see this one, most likely I would not see future ones.

Please provide a list of the all RH Bugzillas that are required to implement this functionality on Power.
Comment 17 IBM Bug Proxy 2009-04-13 13:40:40 EDT
------- Comment From masonmik@us.ibm.com 2009-04-13 13:39 EDT-------
(In reply to comment #27)
> That is an Issue Tracker number which maps to RH BZ
> https://bugzilla.redhat.com/show_bug.cgi?id=492580.  Features (which this is
> one) should never be requested through Issue Tracker but should come directly
> through Bugzilla.  In the future please work with Emily Ratliff to get these
> included in the LTC's list of feature requests.  It is rare that I check Issue
> Tracker since it is not used for feature requests but I happened to see this
> one, most likely I would not see future ones.
>
> Please provide a list of the all RH Bugzillas that are required to implement
> this functionality on Power.
>

Sorry, should have dug a little deeper to get the RH BZ number.

As for why RH492580 was requested via Issue Tracker, I don't know.  It's not my bug.  I just know that this bug, which makes changes to support MSI-X in EEH on powerpc, is dependent on bug RH492580.  But it's only dependent in that I cannot test this patch against a device that supports MSI-X because without RH492580 MSI-X itself isn't supported on powerpc.

As far as I know, only bugs RH492580 and RH475696 are required to support MSI-X on powerpc.
Comment 18 Chris Ward 2009-06-14 19:19:00 EDT
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~

RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should
be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner!

If you encounter any issues, please set the bug back to the ASSIGNED state and
describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!
Comment 19 Chris Ward 2009-07-03 14:17:24 EDT
~~ Attention - RHEL 5.4 Beta Released! ~~

RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner!

If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity.

Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value.

Questions can be posted to this bug or your customer or partner representative.
Comment 20 IBM Bug Proxy 2009-07-04 11:20:36 EDT
------- Comment From masonmik@us.ibm.com 2009-07-04 11:15 EDT-------
I have verified that this patch is in Beta 1.  Closing.
Comment 22 errata-xmlrpc 2009-09-02 04:16:02 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-1243.html

Note You need to log in before you can comment on or make changes to this bug.