=Comment: #0================================================= Emily J. Ratliff <ratliff.com> - 1. Feature Overview: Feature Id: [201779] a. Name of Feature: EEH infrastructure change for MSI-X interrupt support b. Feature Description Update the kernel EEH handling code in order to support the system IO error recovery adequately for devices running with MSI-X interrupts, which is going to be enabled on Power systems by 4Q/2009. 2. Feature Details: Sponsor: PPC Architectures: ppc64 Arch Specificity: Purely Common Code Affects Core Kernel: Yes Delivery Mechanism: Backport Category: Infrastructure Request Type: Kernel - Enhancement from IBM d. Upstream Acceptance: Accepted Sponsor Priority 1 f. Severity: High IBM Confidential: no Code Contribution: IBM code g. Component Version Target: 2.6.28 3. Business Case Enhance the RAS support required for Power platform IO. 4. Primary contact at Red Hat: John Jarvis jjarvis 5. Primary contacts at Partner: Project Management Contact: Mike Wortman, wortman.com, 512-838-8582 Technical contact(s): Daisy Chang, daisyc.com Michael Mason, masonmik.com IBM Manager: Larry Kessler, lkessler.com
IBM is signed up to test and provide feedback
Created attachment 330049 [details] Support EEH recovery for devices using MSI-X This patch *should* allow devices using MSI-X to recover from an EEH error. It restores the MSI-X registers during the recovery process. I have successfully built and booted kernels with this patch on ppc64 and x86-64, but have not been able to test it yet because the general MSI-X support is not working properly. I will test once we have a ppc64 kernel where MSI-X is working. My understanding is that Michael Ellerman from IBM Ozlabs will be submitting patches via Bugzilla to fix MSI-X.
This feature does not appear to be using the generic fix for eeh restore all registers implemented in RHEL 5.3 in https://bugzilla.redhat.com/show_bug.cgi?id=470580 . Any reason for that? This code was included to prevent having to implement this feature for each driver.
(In reply to comment #8) > This feature does not appear to be using the generic fix for eeh restore all > registers implemented in RHEL 5.3 in > https://bugzilla.redhat.com/show_bug.cgi?id=470580 . Any reason for that? > This code was included to prevent having to implement this feature for each > driver. > I don't see how this code breaks the generic fix for eeh restore, but regardless of that, the patch may not be necessary. Once I have a kernel that supports msix on power, I'll test EEH without my patch to see if it works. Marking as TESTED so I can get this bug in the SUBMITTED state. The patch hasn't actually been tested other than to make sure it builds and boots. I'll do real testing once I have a kernel that supports msix on power.
Created attachment 331769 [details] Don't disable MSI and MSI-X when EEH errors occurs Turns out the previous patch was unnecessary for MSI-X support. However, we discovered that interrupt disable/enable was not done correctly for MSI-X. In fact, it isn't necessary to disable/enable MSI and MSI-X interrupts during EEH recovery. MSI and MSI-X interrupts are effectively disabled by the DMA Stopped state when an EEH error occurs. This patch ensures only LSI interrupts are disabled/enabled. This patch has been submitted upstream to the linuxppc-dev mailing list. It has received favorable reviews. It can be referenced here: http://ozlabs.org/pipermail/linuxppc-dev/2009-February/068177.html.
Updating PM score.
RHKML post: http://post-office.corp.redhat.com/archives/rhkernel-list/2009-February/msg00488.html
GIT commit ID: http://git.kernel.org/?p=linux/kernel/git/sfr/linux-next.git;a=commit;h=8535ef05a6904429ce72671c3035dbf05e6d5edf
This enhancement request was evaluated by the full Red Hat Enterprise Linux team for inclusion in a Red Hat Enterprise Linux minor release. As a result of this evaluation, Red Hat has tentatively approved inclusion of this feature in the next Red Hat Enterprise Linux Update minor release. While it is a goal to include this enhancement in the next minor release of Red Hat Enterprise Linux, the enhancement is not yet committed for inclusion in the next minor release pending the next phase of actual code integration and successful Red Hat and partner testing.
(In reply to comment #17) > RHKML post: > > http://post-office.corp.redhat.com/archives/rhkernel-list/2009-February/msg00488.html > Please include the contents of this RHKML message. I don't have access to RHKML. (In reply to comment #18) > This enhancement request was evaluated by the full Red Hat Enterprise Linux > team for inclusion in a Red Hat Enterprise Linux minor release. As a > result of this evaluation, Red Hat has tentatively approved inclusion of > this feature in the next Red Hat Enterprise Linux Update minor release. > While it is a goal to include this enhancement in the next minor release > of Red Hat Enterprise Linux, the enhancement is not yet committed for > inclusion in the next minor release pending the next phase of actual > code integration and successful Red Hat and partner testing. > Does this mean the patch *is* or *isn't* planned for inclusion in 5.4?
in kernel-2.6.18-133.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5 Please do NOT transition this bugzilla state to VERIFIED until our QE team has sent specific instructions indicating when to do so. However feel free to provide a comment indicating that this fix has been verified.
------- Comment From masonmik.com 2009-04-13 11:55 EDT------- The patch for this bug is in the rhel 5.4 kotd, but the underlying msi-x support patches from bug 51274 are not. The patch in this bug cannot be tested until the 51274 patches are included.
51274 refers to an IBM Bugzilla, what is the corresponding Red Hat Bugzilla?
------- Comment From masonmik.com 2009-04-13 12:39 EDT------- (In reply to comment #25) > 51274 refers to an IBM Bugzilla, what is the corresponding Red Hat Bugzilla? > RIT279700
That is an Issue Tracker number which maps to RH BZ https://bugzilla.redhat.com/show_bug.cgi?id=492580. Features (which this is one) should never be requested through Issue Tracker but should come directly through Bugzilla. In the future please work with Emily Ratliff to get these included in the LTC's list of feature requests. It is rare that I check Issue Tracker since it is not used for feature requests but I happened to see this one, most likely I would not see future ones. Please provide a list of the all RH Bugzillas that are required to implement this functionality on Power.
------- Comment From masonmik.com 2009-04-13 13:39 EDT------- (In reply to comment #27) > That is an Issue Tracker number which maps to RH BZ > https://bugzilla.redhat.com/show_bug.cgi?id=492580. Features (which this is > one) should never be requested through Issue Tracker but should come directly > through Bugzilla. In the future please work with Emily Ratliff to get these > included in the LTC's list of feature requests. It is rare that I check Issue > Tracker since it is not used for feature requests but I happened to see this > one, most likely I would not see future ones. > > Please provide a list of the all RH Bugzillas that are required to implement > this functionality on Power. > Sorry, should have dug a little deeper to get the RH BZ number. As for why RH492580 was requested via Issue Tracker, I don't know. It's not my bug. I just know that this bug, which makes changes to support MSI-X in EEH on powerpc, is dependent on bug RH492580. But it's only dependent in that I cannot test this patch against a device that supports MSI-X because without RH492580 MSI-X itself isn't supported on powerpc. As far as I know, only bugs RH492580 and RH475696 are required to support MSI-X on powerpc.
~~ Attention Partners RHEL 5.4 Partner Alpha Released! ~~ RHEL 5.4 Partner Alpha has been released on partners.redhat.com. There should be a fix present that addresses this particular request. Please test and report back your results here, at your earliest convenience. Our Public Beta release is just around the corner! If you encounter any issues, please set the bug back to the ASSIGNED state and describe the issues you encountered. If you have verified the request functions as expected, please set your Partner ID in the Partner field above to indicate successful test results. Do not flip the bug status to VERIFIED. Further questions can be directed to your Red Hat Partner Manager. Thanks!
~~ Attention - RHEL 5.4 Beta Released! ~~ RHEL 5.4 Beta has been released! There should be a fix present in the Beta release that addresses this particular request. Please test and report back results here, at your earliest convenience. RHEL 5.4 General Availability release is just around the corner! If you encounter any issues while testing Beta, please describe the issues you have encountered and set the bug into NEED_INFO. If you encounter new issues, please clone this bug to open a new issue and request it be reviewed for inclusion in RHEL 5.4 or a later update, if it is not of urgent severity. Please do not flip the bug status to VERIFIED. Only post your verification results, and if available, update Verified field with the appropriate value. Questions can be posted to this bug or your customer or partner representative.
------- Comment From masonmik.com 2009-07-04 11:15 EDT------- I have verified that this patch is in Beta 1. Closing.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-1243.html