Bug 455308

Summary: Altix Partitioned System
Product: Red Hat Enterprise Linux 5 Reporter: George Beshers <gbeshers>
Component: kernelAssignee: George Beshers <gbeshers>
Status: CLOSED ERRATA QA Contact: Martin Jenner <mjenner>
Severity: high Docs Contact:
Priority: high    
Version: 5.3CC: jh, martinez, mgahagan, prarit, tao, tee
Target Milestone: rcKeywords: Tracking
Target Release: ---   
Hardware: ia64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
* The Dual-Core Intel Itanium 2 processor filled out machine check architecture (MCA) records differently to previous Intel Itanium processors. The cache check and bus check target identifiers can now be different in some circumstances. The kernel has been updated to find the correct target identifier.
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-01-20 19:53:08 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 460590    
Bug Blocks: 454962    
Attachments:
Description Flags
Tar of patchset none

Description George Beshers 2008-07-14 19:18:32 UTC
Description of problem:


Version-Release number of selected component (if applicable):


This is IT# 185563

How reproducible:

Systematically when a large RHEL5.1 system is
partitioned into separate SSIs with ProPack/xpmem
providing MPI support.

A few of these are customer reported bugs, the
rest have been found by internal SGI testing or
are related to the montecito processors and
prerequisites.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 George Beshers 2008-07-15 18:26:08 UTC
Created attachment 311861 [details]
Tar of patchset


This contains the patchset plus the series file.

Comment 2 RHEL Program Management 2008-07-23 20:23:27 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 4 Don Zickus 2008-09-11 19:43:37 UTC
in kernel-2.6.18-111.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Comment 7 Ryan Lerch 2008-11-04 02:25:44 UTC
Three of the patches in the patchset(comment#1) were marked for inclusion in the 5.3 release notes:

These are:

ia64-montecito-mca-support	[IA64] MCA recovery: Montecito support
ia64-add-se-bit			[IA64] Add se bit to Processor State Parameter structure
ia64-add-dp-bit			[IA64] Add dp bit to cache and bus check structs

Comment 8 Ryan Lerch 2008-11-04 02:25:44 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
The information in MCA records is filled in slightly differently on
Montecito than on Madison/McKinley.  Usually, the cache check and bus
check target identifiers have the same address.   On Montecito the
cache check and bus check target identifiers can be different if
a corrected error (ie SBE or unconsumed poison data) was encountered and
then an uncorrected error (ie DBE) was consumed.  In that case, the
cache check target identifier is the physical address of the DBE (that
caused the MCA to surface) while the bus check target identifier is the
physical address of the SBE.  This patch correctly finds the target
identifier that triggered the MCA.

Comment 9 Ryan Lerch 2008-11-04 02:26:19 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1,10 +1 @@
-The information in MCA records is filled in slightly differently on
+* The Dual-Core Intel Itanium 2 processor filled out machine check architecture (MCA) records differently to previous Intel Itanium processors. The cache check and bus check target identifiers can now be different is some circumstances. The kernel has been updated to find the correct target identifier.-Montecito than on Madison/McKinley.  Usually, the cache check and bus
-check target identifiers have the same address.   On Montecito the
-cache check and bus check target identifiers can be different if
-a corrected error (ie SBE or unconsumed poison data) was encountered and
-then an uncorrected error (ie DBE) was consumed.  In that case, the
-cache check target identifier is the physical address of the DBE (that
-caused the MCA to surface) while the bus check target identifier is the
-physical address of the SBE.  This patch correctly finds the target
-identifier that triggered the MCA.

Comment 10 Ryan Lerch 2008-11-04 02:26:56 UTC
Release note updated. If any revisions are required, please set the 
"requires_release_notes"  flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

Diffed Contents:
@@ -1 +1 @@
-* The Dual-Core Intel Itanium 2 processor filled out machine check architecture (MCA) records differently to previous Intel Itanium processors. The cache check and bus check target identifiers can now be different is some circumstances. The kernel has been updated to find the correct target identifier.+* The Dual-Core Intel Itanium 2 processor filled out machine check architecture (MCA) records differently to previous Intel Itanium processors. The cache check and bus check target identifiers can now be different in some circumstances. The kernel has been updated to find the correct target identifier.

Comment 12 George Beshers 2008-11-21 13:48:30 UTC
Sorry for the delayed response.

The above looks good.

The patches have been checked.

Comment 14 errata-xmlrpc 2009-01-20 19:53:08 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2009-0225.html