Red Hat Bugzilla – Bug 464177
[LTC 6.0 FEAT] 200985:New package request rtas-diag for Platform Error Analysis Tools
Last modified: 2009-12-04 15:10:39 EST
Emily J. Ratliff <email@example.com> - 2008-09-16 18:24 EDT
1. Feature Overview:
Feature Id: 
a. Name of Feature: Platform Error Analysis Tools
b. Feature Description
New package request for tools for analysis of platform error data surfaced by Power platforms -
2. Feature Details:
Arch Specificity: Purely Arch Specific Code
Delivery Mechanism: Direct from community
Category: Power Servicability
Request Type: Package - New
d. Upstream Acceptance: Accepted
Sponsor Priority 2
f. Severity: Medium
IBM Confidential: no
Code Contribution: IBM code
g. Component Version Target: rtas-diag 2.3
3. Business Case
These tools decode diagnostic events delivered by Power platforms, perform any necessary recovery
actions (such as offlining failing CPUs, performing shutdowns in response to thermal conditions,
etc) and send notifications if parts need to be replaced. These tools have existed for several
years, but have not yet been open sourced.
4. Primary contact at Red Hat:
5. Primary contacts at Partner:
Project Management Contact:
Mike Wortman, firstname.lastname@example.org, 512-838-8582
Michael Strosaker, email@example.com
Larry Kessler, firstname.lastname@example.org
This package has been open sourced as ppc64-diag, and is available at:
Can you please confirm that tools to do such monitoring don't already exists in a non-arch specific way?
Product Management has reviewed and declined this request. You may appeal this
decision by reopening this request.
------- Comment From email@example.com 2009-12-04 15:01 EDT-------
Indeed there is no mechanism for performing these actions in a non-arch specific way. The underlying hardware errors are intimately tied to the architecture, and the error codes and diagnostic routines are very Power specific.
Would appreciate reconsideration of this request. ppc64-diag is a tool needed by all Power customers, for without it ALL rtas events (which includes all communication from the HCM/FSP) are not available to the OS. Things like lpar shutdown will fail.