+++ This bug was initially created as a clone of Bug #250611 +++ +++ This bug was initially created as a clone of Bug #249667 +++ Description of problem: The Axon PCIe root complexes used in the IBM QS21 systems report PCI errors (e.g. poisoned TLP, crc error, etc) it asserts an interrupt that has to be caught by Linux. The "driver" will dump out some registers, then panic. It is an extra file in arch/powerpc/platforms/cell and does not impact other platforms. Without the patches to support this error reporting these systems witll hang on boot in the face of PCI errors. IBM System Integration Test(SIT) has defined this defect as an SIT exit gate. QS21 GA will be delayed by every day the fix is not available in RHEL 5.1. Version-Release number of selected component (if applicable): 2.6.18-8.EL How reproducible: 100% given appropriate test hardware. Steps to Reproduce: 1. To be provided by IBM Actual results: Hang/no boot response. Expected results: Correct error reporting & resultant panic if fatal. Additional info: Hardware for testing is being delivered to Westford (?) as soon as IBM resolve final firmware issues. -- Additional comment from breeves on 2007-07-26 06:59 EST -- Created an attachment (id=160005) proposed patch from IBM -- Additional comment from breeves on 2007-07-26 07:01 EST -- Created an attachment (id=160006) proposed patch from IBM [2/3] -- Additional comment from breeves on 2007-07-26 07:02 EST -- Created an attachment (id=160007) proposed patch from IBM [3/3] -- Additional comment from tao on 2007-07-26 12:05 EST -- ------- Additional Comments From smoser.com (prefers email at ssmoser.com) 2007-07-26 12:02 EDT ------- (In reply to comment #27) > Sorry, I accidently picked the wrong rpm. Now it works for PCIe. Still have to > verify for PCI-X though (on a different machine). Have you been able to do that ? This event sent from IssueTracker by Glen Johnson issue 126663 -- Additional comment from tao on 2007-07-26 12:41 EST -- ----- Additional Comments From Jens.Osterkamp.com (prefers email at jens.com) 2007-07-26 12:37 EDT ------- Yes, it works for PCI-X also. This event sent from IssueTracker by Glen Johnson issue 126663 -- Additional comment from pm-rhel on 2007-07-26 13:07 EST -- This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release. -- Additional comment from smoser on 2007-07-26 14:19 EST -- posted: http://post-office.corp.redhat.com/archives/rhkernel-list/2007-July/thread.html#00836 -- Additional comment from tao on 2007-07-26 17:26 EST -- ----- Additional Comments From bherren.com (prefers email at benh.com) 2007-07-26 17:21 EDT ------- Wait, this bugzilla entry is still missing a patch that's already upstream but not backported yet. I'll attach it today. This event sent from IssueTracker by Glen Johnson issue 126663 -- Additional comment from jturner on 2007-07-27 11:37 EST -- Patches (at least the ones posted to this point) are POWER specific. QE withholding ack based on: 1) need the missing patch referred to in comment 11 2) need testing results from patches applied to current Red Hat code 3) need IBM commitment on testing -- Additional comment from tao on 2007-07-27 21:20 EST -- ------- Additional Comments From smoser.com (prefers email at ssmoser.com) 2007-07-27 21:17 EDT ------- (In reply to comment #34) > Wait, this bugzilla entry is still missing a patch that's already upstream but > not backported yet. I'll attach it today. > Just a reminder, we're still waiting on this. Internal Status set to 'Waiting on Support' Status set to: Waiting on Tech This event sent from IssueTracker by Glen Johnson issue 126663 -- Additional comment from tao on 2007-07-27 21:30 EST -- ----- Additional Comments From bherren.com (prefers email at benh.com) 2007-07-27 21:28 EDT ------- Sorry for the confusion, the fix I'm talking about is the one that was submited in a separate entry on bug #36932 (mpic protected sources). The comment on the later is a bit misleading as that patch doesn't only apply to the DDR errors, but also to the PCI-X/PCIe one afaik. This event sent from IssueTracker by Glen Johnson issue 126663 -- Additional comment from smoser on 2007-07-30 09:10 EST -- (In reply to comment #12) > Patches (at least the ones posted to this point) are POWER specific. QE > withholding ack based on: > > 1) need the missing patch referred to in comment 11 This was a misunderstanding, probably my fault. As Ben mentioned above, he opened RH bug 249910 (LTC bug 36932) to address the additional issue. There are no further changes needed for this bug. > 2) need testing results from patches applied to current Red Hat code Redhat comment 5 above mentions Jens Osterkamp's test. He tested and verified for both PCI-X and PCIe. The kernel he verified with was built using brew (http://brewweb.devel.redhat.com/brew/taskinfo?taskID=887483). It contains the patches as submitted to rhkernel-list applied to 2.6.18-36.EL (just for the record, it also includes patches for RH bugs for 242937 and 247658) > 3) need IBM commitment on testing Unless I'm mistaken, IBM has agreed to testing for all Cell platform. Does that address all your concerns? -- Additional comment from breeves on 2007-07-30 09:20 EST -- Thanks Scott - all fine from my side -- Additional comment from jturner on 2007-07-30 09:39 EST -- QE ack for the exception, then. -- Additional comment from robbiew.com on 2007-08-02 10:58 EST -- The soon-to-be released QS21 Cell/B.E. BladeServer from IBM is supposed to support F7, so IBM would really appreciate it if a kernel update with this patch could be made available to F7 users. -- Additional comment from robbiew.com on 2007-08-02 11:06 EST -- Created an attachment (id=160528) simple patch to panic when SERR or PERR occurs on PCI-X -- Additional comment from robbiew.com on 2007-08-02 11:07 EST -- Created an attachment (id=160529) simple patch to panic when an error occurs on PCIe
The QS21 is also supported on Fedora Core 6, so IBM would like this patch included in the next kernel update, if possible. Do we need to provide a backport?
Just realized that IBM can resolve this as we provide a kernel with the Cell SDK supported on FC6. Closing.