Bug 451164 - Firmware error with MT25204 Infiniband HCAs
Summary: Firmware error with MT25204 Infiniband HCAs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: openib
Version: 4.7
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Doug Ledford
QA Contact:
URL:
Whiteboard:
Depends On: 251934
Blocks: RHEL4u8_relnotes 488813 509904
TreeView+ depends on / blocked
 
Reported: 2008-06-13 05:39 UTC by Gurhan Ozen
Modified: 2013-11-04 01:35 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Hardware testing for the Mellanox MT25204 has revealed that an internal error occurs under certain high-load conditions. When the ib_mthca driver reports a catastrophic error on this hardware, it is usually related to an insufficient completion queue depth relative to the number of outstanding work requests generated by the user application. Although the driver will reset the hardware and recover from such an event, all existing connections at the time of the error will be lost. This generally results in a segmentation fault in the user application. Further, if opensm is running at the time the error occurs, then you need to manually restart it in order to resume proper operation.
Clone Of:
: 488813 509904 (view as bug list)
Environment:
Last Closed: 2009-05-18 20:35:27 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2009:1022 0 normal SHIPPED_LIVE openib bug fix and enhancement update 2009-05-18 14:44:10 UTC

Comment 1 Don Domingo 2008-06-13 12:19:18 UTC
added to RHEL4.7 release notes under "Known Issues":

<quote>
Hardware testing for the Mellanox MT25204 has revealed that an internal error
occurs under certain high-load conditions. When the ib_mthca driver reports a
catastrophic error on this hardware, it is usually related to an insufficient
completion queue depth relative to the number of outstanding work requests
generated by the user application.

Although the driver will reset the hardware and recover from such an event, all
existing connections at the time of the error will be lost. This generally
results in a segmentation fault in the user application. Further, if opensm is
running at the time the error occurs, then you need to manually restart it in
order to resume proper operation.
</quote>

please advise if any further revisions are required. thanks!

Comment 3 RHEL Program Management 2008-09-05 17:25:13 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 7 Don Domingo 2008-10-05 23:56:34 UTC
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Hardware testing for the Mellanox MT25204 has revealed that an internal error
occurs under certain high-load conditions. When the ib_mthca driver reports a
catastrophic error on this hardware, it is usually related to an insufficient
completion queue depth relative to the number of outstanding work requests
generated by the user application.

Although the driver will reset the hardware and recover from such an event, all
existing connections at the time of the error will be lost. This generally
results in a segmentation fault in the user application. Further, if opensm is
running at the time the error occurs, then you need to manually restart it in
order to resume proper operation.

Comment 8 Peter Martuccelli 2008-10-07 13:37:56 UTC
Dev ACK for release note.

Comment 15 errata-xmlrpc 2009-05-18 20:35:27 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1022.html


Note You need to log in before you can comment on or make changes to this bug.