Bug 451164 - Firmware error with MT25204 Infiniband HCAs
Firmware error with MT25204 Infiniband HCAs
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: openib (Show other bugs)
4.7
All Linux
high Severity high
: rc
: ---
Assigned To: Doug Ledford
:
Depends On: 251934
Blocks: RHEL4u8_relnotes 488813 509904
  Show dependency treegraph
 
Reported: 2008-06-13 01:39 EDT by Gurhan Ozen
Modified: 2013-11-03 20:35 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Hardware testing for the Mellanox MT25204 has revealed that an internal error occurs under certain high-load conditions. When the ib_mthca driver reports a catastrophic error on this hardware, it is usually related to an insufficient completion queue depth relative to the number of outstanding work requests generated by the user application. Although the driver will reset the hardware and recover from such an event, all existing connections at the time of the error will be lost. This generally results in a segmentation fault in the user application. Further, if opensm is running at the time the error occurs, then you need to manually restart it in order to resume proper operation.
Story Points: ---
Clone Of:
: 488813 509904 (view as bug list)
Environment:
Last Closed: 2009-05-18 16:35:27 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Comment 1 Don Domingo 2008-06-13 08:19:18 EDT
added to RHEL4.7 release notes under "Known Issues":

<quote>
Hardware testing for the Mellanox MT25204 has revealed that an internal error
occurs under certain high-load conditions. When the ib_mthca driver reports a
catastrophic error on this hardware, it is usually related to an insufficient
completion queue depth relative to the number of outstanding work requests
generated by the user application.

Although the driver will reset the hardware and recover from such an event, all
existing connections at the time of the error will be lost. This generally
results in a segmentation fault in the user application. Further, if opensm is
running at the time the error occurs, then you need to manually restart it in
order to resume proper operation.
</quote>

please advise if any further revisions are required. thanks!
Comment 3 RHEL Product and Program Management 2008-09-05 13:25:13 EDT
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.
Comment 7 Don Domingo 2008-10-05 19:56:34 EDT
Release note added. If any revisions are required, please set the 
"requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly.
All revisions will be proofread by the Engineering Content Services team.

New Contents:
Hardware testing for the Mellanox MT25204 has revealed that an internal error
occurs under certain high-load conditions. When the ib_mthca driver reports a
catastrophic error on this hardware, it is usually related to an insufficient
completion queue depth relative to the number of outstanding work requests
generated by the user application.

Although the driver will reset the hardware and recover from such an event, all
existing connections at the time of the error will be lost. This generally
results in a segmentation fault in the user application. Further, if opensm is
running at the time the error occurs, then you need to manually restart it in
order to resume proper operation.
Comment 8 Peter Martuccelli 2008-10-07 09:37:56 EDT
Dev ACK for release note.
Comment 15 errata-xmlrpc 2009-05-18 16:35:27 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2009-1022.html

Note You need to log in before you can comment on or make changes to this bug.