added to RHEL4.7 release notes under "Known Issues": <quote> Hardware testing for the Mellanox MT25204 has revealed that an internal error occurs under certain high-load conditions. When the ib_mthca driver reports a catastrophic error on this hardware, it is usually related to an insufficient completion queue depth relative to the number of outstanding work requests generated by the user application. Although the driver will reset the hardware and recover from such an event, all existing connections at the time of the error will be lost. This generally results in a segmentation fault in the user application. Further, if opensm is running at the time the error occurs, then you need to manually restart it in order to resume proper operation. </quote> please advise if any further revisions are required. thanks!
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Release note added. If any revisions are required, please set the "requires_release_notes" flag to "?" and edit the "Release Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Hardware testing for the Mellanox MT25204 has revealed that an internal error occurs under certain high-load conditions. When the ib_mthca driver reports a catastrophic error on this hardware, it is usually related to an insufficient completion queue depth relative to the number of outstanding work requests generated by the user application. Although the driver will reset the hardware and recover from such an event, all existing connections at the time of the error will be lost. This generally results in a segmentation fault in the user application. Further, if opensm is running at the time the error occurs, then you need to manually restart it in order to resume proper operation.
Dev ACK for release note.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2009-1022.html