Description of problem: Version-Release number of selected component (if applicable): This is IT# 185563 How reproducible: Systematically when a large RHEL5.1 system is partitioned into separate SSIs with ProPack/xpmem providing MPI support. A few of these are customer reported bugs, the rest have been found by internal SGI testing or are related to the montecito processors and prerequisites. Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 311861 [details] Tar of patchset This contains the patchset plus the series file.
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
in kernel-2.6.18-111.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Three of the patches in the patchset(comment#1) were marked for inclusion in the 5.3 release notes: These are: ia64-montecito-mca-support [IA64] MCA recovery: Montecito support ia64-add-se-bit [IA64] Add se bit to Processor State Parameter structure ia64-add-dp-bit [IA64] Add dp bit to cache and bus check structs
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: The information in MCA records is filled in slightly differently on Montecito than on Madison/McKinley. Usually, the cache check and bus check target identifiers have the same address. On Montecito the cache check and bus check target identifiers can be different if a corrected error (ie SBE or unconsumed poison data) was encountered and then an uncorrected error (ie DBE) was consumed. In that case, the cache check target identifier is the physical address of the DBE (that caused the MCA to surface) while the bus check target identifier is the physical address of the SBE. This patch correctly finds the target identifier that triggered the MCA.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,10 +1 @@ -The information in MCA records is filled in slightly differently on +* The Dual-Core Intel Itanium 2 processor filled out machine check architecture (MCA) records differently to previous Intel Itanium processors. The cache check and bus check target identifiers can now be different is some circumstances. The kernel has been updated to find the correct target identifier.-Montecito than on Madison/McKinley. Usually, the cache check and bus -check target identifiers have the same address. On Montecito the -cache check and bus check target identifiers can be different if -a corrected error (ie SBE or unconsumed poison data) was encountered and -then an uncorrected error (ie DBE) was consumed. In that case, the -cache check target identifier is the physical address of the DBE (that -caused the MCA to surface) while the bus check target identifier is the -physical address of the SBE. This patch correctly finds the target -identifier that triggered the MCA.
Release note updated. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1 +1 @@ -* The Dual-Core Intel Itanium 2 processor filled out machine check architecture (MCA) records differently to previous Intel Itanium processors. The cache check and bus check target identifiers can now be different is some circumstances. The kernel has been updated to find the correct target identifier.+* The Dual-Core Intel Itanium 2 processor filled out machine check architecture (MCA) records differently to previous Intel Itanium processors. The cache check and bus check target identifiers can now be different in some circumstances. The kernel has been updated to find the correct target identifier.
Sorry for the delayed response. The above looks good. The patches have been checked.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2009-0225.html