1. Feature Name: Support for new Chelsio 10G Ethernet Controller and OFED driver 2. Description: a. Architectures: 32-bit x86 64-bit Intel EM64T/AMD64 64-bit Itanium2 64-bit PPC : b. Dependencies: cxgb3 nic driver from kernel.org OFED-1.2.5, specifically the pieces below: ofed-1.2.5 kernel core ofed-1.2.5 cxgb3 and iw_cxgb3 kernel drivers ofed-1.2.5 libibverbs ofed-1.2.5 librdmacm ofed-1.2.5 libcxb3 ofed-1.2.5 mvapich2-0.9.8-15 ofed-1.2.5 mpi-selector c. External links: OFED bits can be found at http://www.openfabrics.org/builds/ofed-1.2.5/release/OFED-1.2.5.tgz Note: The OFED kernel bits are available in the kernel.org linux tree. d. Priority (H,M,L): High e. Target Release: RHEL4.7 and RHEL5.2 f. Target Release Date: g. Drivers or hardware dependency: None h. Target Kernel: 2.6.9-x (RHEL4.7) 2.6.18-x (RHEL5.2) i. Is code accepted upstream in Linus' tree? Yes j. Who will backport it to 2.6.9 and/or 2.6.18 kernel ? cxgb3 will be shipped in 5.1 and 4.6. We can provide update patches to RedHat. The status of OFED-1.2.5 in 5.1 and 4.6 is unknown to us, we will adjust. 3. Business Justification: Red Hat a. Why is this feature needed? Support for Chelsio NIC and OFED over it required by several server OEMs, some National Labs, and large end users b. What hardware does this enable? new Chelsio 10G Ethernet Controller c. Forecast, volume or high end platform? 20kU in ’08, 300kU in ‘09 d. Any configuration info? No e. Are there other dependencies (drivers). No 4. Status: a. Hardware to Red Hat? Yes, Andy Gospodarek b. Back-ported code/patch to Red Hat? Yes, on both 4.6 and 5.1 c. Other status? Previous revision of software accepted as part of 5.1 5. Chelsio technical contact, email, phone, chat Kianoosh Naghshineh Kianoosh, 408-962-3621 Divy Le Ray divy,408-962-3682 Scott Bardone sbardone, 408-962-3639 Steve Wise swise, 512-343-9196 x 101
(In reply to comment #0) > b. Dependencies: > > cxgb3 nic driver from kernel.org > > OFED-1.2.5, specifically the pieces below: OFED 1.2.5 is *supposed* to be the same as OFED 1.2 except for the Connect-X support. I'll need to know if the iw_cxgb3 driver is not the same between OFED 1.2 and OFED 1.2.5. > ofed-1.2.5 kernel core > > ofed-1.2.5 cxgb3 and iw_cxgb3 kernel drivers > > ofed-1.2.5 libibverbs > > ofed-1.2.5 librdmacm > > ofed-1.2.5 libcxb3 The OFED 1.2 version of all these pieces is already in the planned 4.6 update. > ofed-1.2.5 mvapich2-0.9.8-15 > > ofed-1.2.5 mpi-selector These two are not in. The mpi-selector bit won't ever go in. The mvapich2 bit needs some more work before it's really usable by a distribution. It might make 4.7. However, OpenMPI is already in and uses the OFED driver stack for communications. In order for the Chelsio RNIC devices to work properly though, there is some additional work that needs done. This bug needs cloned against the kudzu and initscripts components. They will both need updating to recognize the Chelsio RNICs as network adapters so that initialization of the RNIC ethernet interface happens at boot time properly. Currently, kudzu does not properly list the cxgb3 driver in the /etc/sysconfig/hwconf file. Also, the /sbin/kmodule program that's part of the initscripts package doesn't properly recognize the cxgb3 hardware as a network interface and as a result rc.sysinit does not properly load the driver module.
(In reply to comment #2) > > OFED 1.2.5 is *supposed* to be the same as OFED 1.2 except for the Connect-X > support. I'll need to know if the iw_cxgb3 driver is not the same between > OFED > Functionally, the *cxgb3 drivers are the same. However, all the 1.2.5 ofed drivers were re-based onto 2.6.22 kernel base + any additional fixes that were in ofed-1.2 but not in the 2.6.22 kernel drivers. So the code is a little different. All of the Chelsio bug fixes have gone into both the ofed-1.2 branch and the ofed-1.2.5 (aka ofed-1.2.c) branch of the ofed-1.2 git tree, so if you can't or don't want to go to 1.2.5, then we'll have to stick with ofed-1.2. But you need to pull the latest ofed-1.2 git repos to get all the chelsio fixes since 1.2 GA. > > > ofed-1.2.5 mvapich2-0.9.8-15 > > > > ofed-1.2.5 mpi-selector > > These two are not in. The mpi-selector bit won't ever go in. The mvapich2 > bit > needs some more work before it's really usable by a distribution. It might > make > 4.7. However, OpenMPI is already in and uses the OFED driver stack for > communications. OpenMPI doesn't work over iWARP yet. It doesn't use the rdma-cm so it doesn't support iWARP. The only MPI that works over OFED iWARP verbs and the rdma-cm at this point is mvapich2-0.9.8. If you don't want to pull it in, then customers can get it directly from OSU...
All: I think what we want to do is use ofed-1.2.5 as the place to pull _both_ cxgb3 and iw_cxgb3 code. This will keep everything in sync with all three components: cxgb3, iw_cxgb3 and libcxgb3. This also should enable easy back-porting to rhel5.2 since ofed has a backport system. Basically, we (chelsio/ogc) keep the ofed-1.2.5 drivers up to date by pulling in all cxgb3 and iw_cxgb3 bug patches that get accepted upstream. Useing ofed-1.2.5 gives us a single point of focus. Does this sound reasonable?
Steve, It certainly seems reasonable to do this if you plan to keep the OFED tree in sync with Jeff's and Linus' trees. I'm a little bit reluctant to do this since I don't want to get into a situation where the OFED tree is newer than trees hosted on kernel.org because changes to drivers there aren't pushed in a timely manner. Doug, What are your thoughts?
For tracking purposes: IBM has posted a similar feature request: #254027. Divy
By the way, the ofed-1.2.5 git repos maintains all changes to the pertinent ofed kernel drivers post 2.6.22 as patch files in the kernel_patches/fixes directory. So there are patch files for each patch added to ofed on top of its 2.6.22 base. This should make it clear exactly what is added to cxgb3 and iw_cxgb3. And you can compare these to make sure there is no patches added to ofed that aren't upstream. In addition, the ofed git tree keeps a set of backport files in kernel_addons/backport/<kernel_version/distro>. And the configure scripts for the tree apply the needed patches based on which kernel/distro the tree is built against. Dunno if you guys are familiar with this or not. So this is FYI. Steve.
Hi, I entered BZ 251025 on 8/6/2007 to get support for the Chelsio adapter in RHEL 5.2. Can someone from Redhat take a look at it and determine if it's a duplicate of this issue. Thanks, Rick Bieber
Yes, it looks like it to me.
*** Bug 251025 has been marked as a duplicate of this bug. ***
All, The code Chelsio recommends for rhel5u2 is available in the latest ofed-1.2.5 development build. The distribution tarball is at: http://www.openfabrics.org/builds/connectx/OFED-1.2.5-20070924-0551.tgz I don't know how you want to pull in this code, but if you pull the above tarball, untar it on a rhel5.0 system and build/install it, then you'll get the recommended code for the chelsio device. There are src rpms for the ofed kernel and the ofed user stuff, so you might just need those. But make sure you configure the kernel tree to get all the patches applied to the various kernel modules. (see comment #7 on how ofed kernel trees work). For chelsio's rdma functionality to work, you will need at least these libs: libibverbs, librdmacm, and libcxgb3. You will also need all the core IB kernel modules and these two chelsio modules: iw_cxgb3 and cxgb3. The only MPI that works currently on iWARP/OFED devices is mvapich2. We recommend you pull the mvapich2 code from the ofed distro as well and ship it. This will enable MPI over Chelsio's RDMA NIC. It ships in the ofed tarball as a src rpm. I'm new to the RedHat processes, so you all might want some other mechanism for delivery of all this code. Please let me know. All of the code in the ofed distro can be pulled from various git trees if that is a preferred method. Thanks, Steve (swise)
Thanks, Steve. I don't know how Doug likes to do things, but I specifically like to pull from upstream git trees. Pulling small patches generally makes the backport easier and less error prone because you can take only the needed changes a leave in old infrastructure that has changed upstream since the release of the base kernel. I will talk with Doug to see how we want to handle this since right now I mostly focus on netdrivers and he generally does the OFED backports. We will either need to co-ordinate or figure out who is going to be on the hook for the entire thing.
Hi, I was just looking over the bugs and I noticed #262241 which is closed because OFED 1.2 is already in RHEL 5.1. So is this BZ for support for the Chelsio driver only? If so can/should we remove "and OFED driver" from the name?
All, The cxgb3 driver, iw_cxgb3 driver, and the libcxgb3 library have been updated with a series of bug fixes. The cxgb3 driver patches have been submitted and accepted for upstream. See: http://lkml.org/lkml/2007/11/16/224 and http://lkml.org/lkml/2007/11/23/180 I have also submitted the associated (required) RDMA/iw_cxgb3 change for Roland to merge. The submission is here: http://www.spinics.net/lists/netdev/msg48240.html I am requesting RH pull all of these fixes and the libcxgb3 change into rh5u2 as part of this feature. I will be back-porting all this and including it into ofed-1.2.5 and ofed-1.3, and I'll update this feature when that effort is complete (this week). Thanks, Steve.
Thanks for the info, Steve. We should be able to accomodate that request.
Andy, I will push another series of patches later this week. These patches will extend the initialization for T3C, the latest rev of our chip - explicitely set the internal memory parity error detection. It would be worth getting these bits in RHEL5u2. Cheers, Divy
Thanks for the update, Divy. I'll be happy to wait for those bits, but they will need to be in before the second week of December or I won't be able to take them.
Divy and Andy, I saw your patches and it looks like they are in netdev-2.6#upstream? Does Andy have what he needs for RHEL 5.2? Bill http://marc.info/?l=linux-netdev&m=119687862413438&w=2 List: linux-netdev Subject: [PATCH 0/2] cxgb3 - driver update From: Divy Le Ray <divy () chelsio ! com> Date: 2007-12-05 18:14:53 Jeff, I'm submitting a patch series for inclusion in 2.6.25. The patches are built against netdev#upstream. Here is a brief description: - Update GPIO pinning and MAC support for T3C adapters - Enable parity error detection. Cheers, Divy
Hi Bill, This series is made of 3 patches actually - I posted follow-up 3/2 patch on 12/6. Jeff Garzik attempted to apply them to #upstream-fixes intead of the intended #upstream branch, and that failed for patch 2/2 and 3/2. We're now waiting for Jeff to aply them to #upstream. Divy
The T3C changes are now in the Jeff Garzik netdev-2.6 #upstream tree. commit 1109beac2ef7374ebf216db7a446be77ff77a84e Author: Divy Le Ray <divy> Date: Mon Dec 17 18:47:41 2007 -0800 cxgb3 - Fix EEH, missing softirq blocking set_pci_drvdata() stores a pointer to the adapter, not the net device. Add missing softirq blocking in t3_mgmt_tx. Signed-off-by: Divy Le Ray <divy> Signed-off-by: Jeff Garzik <jeff> commit 610d007c6af1d58e0ba364f7296490a5d544e241 Author: Divy Le Ray <divy> Date: Mon Dec 17 18:47:31 2007 -0800 cxgb3 - parity initialization for T3C adapters. Add parity initialization for T3C adapters. Signed-off-by: Divy Le Ray <divy> Signed-off-by: Jeff Garzik <jeff> commit 3e27775f1d6d45f9d327af1ca827104249e7c601 Merge: 9c8e861... 3fd7131... Author: Jeff Garzik <jeff> Date: Fri Dec 14 17:13:24 2007 -0500 Merge branch 'upstream-fixes' into upstream commit 75758e8aa4b7d5c651261ce653dd8d0b716e1eda Author: Divy Le Ray <divy> Date: Wed Dec 5 10:15:01 2007 -0800 cxgb3 - T3C support update Update GPIO mapping for T3C. Update xgmac for T3C support. Fix typo in mtu table. Signed-off-by: Divy Le Ray <divy> Signed-off-by: Jeff Garzik <jeff>
Divy, for some reason those changes aren't showing up in my local copy (and I've pulled every which way imaginable). Hopefully they will be there soon...
spoke too soon -- got 'em now....
Andy, QA here has started your latest rpms. I haven't look yet at he code. When do you think you will have the latest patches in ? Thanks, Divy
I should have the patches in a few minutes and build in 3-4 hours.
Created attachment 290054 [details] cxgb3-rhel5-test5.patch patch I plan to integrate into my test builds
Created attachment 290063 [details] cxgb3-rhel5-test6.patch drop the old one in favor of this one -- forgot 2 small bits that are needed
Andy, Our testing looks good so far.
Hi Andy, will your patch be integrated in RHEL5.2. This is our goal and expectation. I'm getting confused with the activity #253195. Cheers, Divy
You can download this test kernel from http://people.redhat.com/dzickus/el5
Andy, Doug, Steve Wise posted 2 patches that have been committed: [patch 1] http://git.kernel.org/?p=linux/kernel/git/jgarzik/netdev- 2.6.git;a=commit;h=4eb61e0231be536d8116457b67b3e447bbd510dc cxgb3: Handle ARP completions that mark neighbors stale. When ARP completes due to a request rather than a reply the neighbor is marked NUD_STALE instead of reachable (see arp_process()). The handler for the resulting netevent needs to check also for NUD_STALE. Failure to use the arp entry can cause RDMA connection failures. [patch 2] http://git.kernel.org/? p=linux/kernel/git/roland/infiniband.git;a=commitdiff;h=8704e9a8790cc9e394198663 c1c9150c899fb9a2 The cxgb3 HW and driver don't support loopback RDMA connections. So fail any connection attempt where the destination address is local. Patch1 is a must have. Without it, mpi clusters can have startup problems. Patch2 avoids a crash when lookpack connections are attempted. We would like to see these patches included in RHEL5.2. Please let us know of any concern. Cheers, Divy
Divy, I can try to get this in, but it might be a problem. If we cannot get it into 5.2 we can get it in an errata kernel shortly after.
All: The mvapich2 library from OFED-1.3 is required for MPI over iWARP at this point. Open MPI is not yet enabled over iWARP in general. The openib btl doesn't use the rdma-cm, which is a requirement for iWARP, and the uDAPL transport also hasn't been tweaked to work correctly over iWARP. This was listed as a dependency in the original opening text for this feature. Is it possible for you to ship mvapich2-1.0.2 from OFED-1.3? Thanks, Steve.
My test kernels have been updated to include a patch for this bugzilla. http://people.redhat.com/agospoda/#rhel5 Please test them and report back your results.
Andy, Our QA team has successfully tested your test kernel. Cheers, Divy
in kernel-$NEW_VER You can download this test kernel from http://people.redhat.com/dzickus/el5
in kernel-2.6.18-85.el5 You can download this test kernel from http://people.redhat.com/dzickus/el5
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot3--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot4--available now on partners.redhat.com. Please test and confirm that your issue is fixed. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
Greetings Red Hat Partner, A fix for this issue should be included in the latest packages contained in RHEL5.2-Snapshot6--available now on partners.redhat.com. We are nearing GA for 5.2 so please test and confirm that your issue is fixed ASAP. After you (Red Hat Partner) have verified that this issue has been addressed, please perform the following: 1) Change the *status* of this bug to VERIFIED. 2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified) If this issue is not fixed, please add a comment describing the most recent symptoms of the problem you are having and change the status of the bug to ASSIGNED. If you are receiving this message in Issue Tracker, please reply with a message to Issue Tracker about your results and I will update bugzilla for you. If you need assistance accessing ftp://partners.redhat.com, please contact your Partner Manager. Thank you
All the requested bits are in. Thanks a lot for the work! Cheers, Divy
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on the solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0314.html