RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 660680 - iw_cxgb3 advertises incorrect max cq depth causing stalls on large MPI clusters
Summary: iw_cxgb3 advertises incorrect max cq depth causing stalls on large MPI clusters
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel
Version: 6.0
Hardware: All
OS: Linux
low
high
Target Milestone: rc
: ---
Assignee: Doug Ledford
QA Contact: Network QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-12-07 15:27 UTC by Steve Wise
Modified: 2011-05-19 12:37 UTC (History)
8 users (show)

Fixed In Version: kernel-2.6.32-112.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 628223
Environment:
Last Closed: 2011-05-19 12:37:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0542 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 6.1 kernel security, bug fix and enhancement update 2011-05-19 11:58:07 UTC

Description Steve Wise 2010-12-07 15:27:53 UTC
+++ This bug was initially created as a clone of Bug #628223 +++

Description of problem:

iw_cxgb3 advertises a max cq depth of 256K entries, but he T3 HW only supports a max depth of 64K entries.  This causes MPI applications to stall when running in large cluster configurations (like 256NP 32 node clusters).

The fix is a 1 liner to drop the max advertised cq depth to 65536.  The fix has been posted to linux-rdma.org and I'll provide a patch to this bug.

This is urgent as it breaks large cluster operation.


Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

--- Additional comment from swise on 2010-08-28 13:50:18 EDT ---

Created attachment 441721 [details]
Advertise the correct max cq depth for T3 devices.

This was submitted to linux-rdma.org today.

--- Additional comment from pm-rhel on 2010-12-07 05:26:33 EST ---

This request was evaluated by Red Hat Product Management for inclusion in Red Hat Enterprise Linux 5.6 and Red Hat does not plan to fix this issue the currently developed update.

Contact your manager or support representative in case you need to escalate this bug.

--- Additional comment from swise on 2010-12-07 10:11:03 EST ---

But its a 1 line fix!  Low risk.  High yield.

--- Additional comment from swise on 2010-12-07 10:18:54 EST ---

Sites like University of Wisconsin and Purdue have large (128 + core) MPI clusters and will see this problem if you don't ship this fix.

Comment 1 Steve Wise 2010-12-07 15:28:30 UTC
I cloned 628223 to track this issue for rhel6.

Comment 3 RHEL Program Management 2011-01-07 04:32:42 UTC
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Because the affected component is not scheduled to be updated
in the current release, Red Hat is unfortunately unable to
address this request at this time. Red Hat invites you to
ask your support representative to propose this request, if
appropriate and relevant, in the next release of Red Hat
Enterprise Linux. If you would like it considered as an
exception in the current release, please ask your support
representative.

Comment 4 Suzanne Logcher 2011-01-07 16:16:57 UTC
This request was erroneously denied for the current release of Red Hat
Enterprise Linux.  The error has been fixed and this request has been
re-proposed for the current release.

Comment 5 Doug Ledford 2011-01-14 18:12:41 UTC
Fixed for 6.1

Comment 6 RHEL Program Management 2011-01-17 23:21:48 UTC
This request was evaluated by Red Hat Product Management for inclusion
in a Red Hat Enterprise Linux maintenance release. Product Management has 
requested further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed 
products. This request is not yet committed for inclusion in an Update release.

Comment 7 Hushan Jia 2011-01-21 04:02:17 UTC
@Steve,
Could you confirm test commitment that when 6.1 beta is out you will test this bug and post test feedback?

Thanks.

Comment 8 Steve Wise 2011-01-21 04:13:51 UTC
(In reply to comment #7)
> @Steve,
> Could you confirm test commitment that when 6.1 beta is out you will test this
> bug and post test feedback?
> 
> Thanks.

definitely.

Point me at the kernel images when they're ready, and I'll verify the fix.

thanks.

Comment 9 Aristeu Rozanski 2011-02-03 16:34:22 UTC
Patch(es) available on kernel-2.6.32-112.el6

Comment 11 Steve Wise 2011-02-03 16:40:55 UTC
Where can I pull this kernel?  

Thanks!

Comment 13 Chris Ward 2011-04-06 11:03:30 UTC
~~ Partners and Customers ~~

This bug was included in RHEL 6.1 Beta. Please confirm the status of this request as soon as possible.

If you're having problems accessing 6.1 bits, are delayed in your test execution or find in testing that the request was not addressed adequately, please let us know.

Thanks!

Comment 14 errata-xmlrpc 2011-05-19 12:37:41 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2011-0542.html


Note You need to log in before you can comment on or make changes to this bug.