This service will be undergoing maintenance at 00:00 UTC, 2016-08-01. It is expected to last about 1 hours
Bug 253449 - FEAT: Support for new Chelsio 10G Ethernet Controller and OFED driver
FEAT: Support for new Chelsio 10G Ethernet Controller and OFED driver
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel (Show other bugs)
5.2
All Linux
high Severity high
: ---
: ---
Assigned To: Andy Gospodarek
Martin Jenner
: FutureFeature, HardwareEnablement, OtherQA, Patch
: 251025 (view as bug list)
Depends On: 253023
Blocks: 232927 249264 296431 372911 420521 422431 422441
  Show dependency treegraph
 
Reported: 2007-08-19 15:31 EDT by Larry Troan
Modified: 2016-04-18 05:46 EDT (History)
15 users (show)

See Also:
Fixed In Version: RHBA-2008-0314
Doc Type: Enhancement
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2008-05-21 10:52:56 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)
cxgb3-rhel5-test5.patch (133.67 KB, patch)
2007-12-19 15:17 EST, Andy Gospodarek
no flags Details | Diff
cxgb3-rhel5-test6.patch (133.67 KB, patch)
2007-12-19 15:36 EST, Andy Gospodarek
no flags Details | Diff

  None (edit)
Description Larry Troan 2007-08-19 15:31:00 EDT
1. Feature Name:

    Support for new Chelsio 10G Ethernet Controller and OFED driver

2. Description:

   a. Architectures:

      32-bit x86

      64-bit Intel EM64T/AMD64

      64-bit Itanium2

      64-bit PPC

        :

   b. Dependencies:

       cxgb3 nic driver from kernel.org

       OFED-1.2.5, specifically the pieces below:

      ofed-1.2.5 kernel core

      ofed-1.2.5 cxgb3 and iw_cxgb3 kernel drivers

      ofed-1.2.5 libibverbs

      ofed-1.2.5 librdmacm

      ofed-1.2.5 libcxb3

      ofed-1.2.5 mvapich2-0.9.8-15

      ofed-1.2.5 mpi-selector

   c. External links:

       OFED bits can be found at 

       http://www.openfabrics.org/builds/ofed-1.2.5/release/OFED-1.2.5.tgz

       Note: The OFED kernel bits are available in the kernel.org linux tree. 

   d. Priority (H,M,L):

       High

   e. Target Release:

       RHEL4.7 and RHEL5.2

   f. Target Release Date:

      

   g. Drivers or hardware dependency:

      None

   h. Target Kernel:

        2.6.9-x (RHEL4.7)

        2.6.18-x (RHEL5.2)

   i. Is code accepted upstream in Linus' tree?

      Yes

   j. Who will backport it to 2.6.9 and/or 2.6.18 kernel ?

      cxgb3 will be shipped in 5.1 and 4.6. We can provide update patches to RedHat.

      The status of OFED-1.2.5 in 5.1 and 4.6 is unknown to us, we will adjust.

 

3. Business Justification:

      Red Hat

a.  Why is this feature needed?

Support for Chelsio NIC and OFED over it required by several server OEMs, some
National Labs, and large end users

b.  What hardware does this enable?

new Chelsio 10G Ethernet Controller

c. Forecast, volume or high end platform?

20kU in ’08, 300kU in ‘09

d.  Any configuration info?

No

e.  Are there other dependencies (drivers).

No

  

4. Status:

   a. Hardware to Red Hat?

      Yes, Andy Gospodarek

   b. Back-ported code/patch to Red Hat?

      Yes, on both 4.6 and 5.1

   c. Other status?

                Previous revision of software accepted as part of 5.1

 

5. Chelsio technical contact, email, phone, chat

      Kianoosh Naghshineh Kianoosh@chelsio.com, 408-962-3621

      Divy Le Ray divy@chelsio.com,408-962-3682

      Scott Bardone sbardone@chelsio.com, 408-962-3639

     Steve Wise swise@chelsio.com, 512-343-9196 x 101
Comment 2 Doug Ledford 2007-08-20 12:53:15 EDT
(In reply to comment #0)
>    b. Dependencies:
> 
>        cxgb3 nic driver from kernel.org
> 
>        OFED-1.2.5, specifically the pieces below:

OFED 1.2.5 is *supposed* to be the same as OFED 1.2 except for the Connect-X
support.  I'll need to know if the iw_cxgb3 driver is not the same between OFED
1.2 and OFED 1.2.5.

>       ofed-1.2.5 kernel core
> 
>       ofed-1.2.5 cxgb3 and iw_cxgb3 kernel drivers
> 
>       ofed-1.2.5 libibverbs
> 
>       ofed-1.2.5 librdmacm
> 
>       ofed-1.2.5 libcxb3

The OFED 1.2 version of all these pieces is already in the planned 4.6 update.

>       ofed-1.2.5 mvapich2-0.9.8-15
> 
>       ofed-1.2.5 mpi-selector

These two are not in.  The mpi-selector bit won't ever go in.  The mvapich2 bit
needs some more work before it's really usable by a distribution.  It might make
4.7.  However, OpenMPI is already in and uses the OFED driver stack for
communications.

In order for the Chelsio RNIC devices to work properly though, there is some
additional work that needs done.  This bug needs cloned against the kudzu and
initscripts components.  They will both need updating to recognize the Chelsio
RNICs as network adapters so that initialization of the RNIC ethernet interface
happens at boot time properly.  Currently, kudzu does not properly list the
cxgb3 driver in the /etc/sysconfig/hwconf file.  Also, the /sbin/kmodule program
that's part of the initscripts package doesn't properly recognize the cxgb3
hardware as a network interface and as a result rc.sysinit does not properly
load the driver module.
Comment 3 Steve Wise 2007-08-20 15:11:44 EDT
(In reply to comment #2)

> 
> OFED 1.2.5 is *supposed* to be the same as OFED 1.2 except for the Connect-X
> support.  I'll need to know if the iw_cxgb3 driver is not the same between
> OFED
>

Functionally, the *cxgb3 drivers are the same.  However, all the 1.2.5 ofed
drivers were re-based onto 2.6.22 kernel base + any additional fixes that were
in ofed-1.2 but not in the 2.6.22 kernel drivers.  So the code is a little
different.  All of the Chelsio bug fixes have gone into both the ofed-1.2 branch
and the ofed-1.2.5 (aka ofed-1.2.c) branch of the ofed-1.2 git tree, so if you
can't or don't want to go to 1.2.5, then we'll have to stick with ofed-1.2.  But
you need to pull the latest ofed-1.2 git repos to get all the chelsio fixes
since 1.2 GA.

> 
> >       ofed-1.2.5 mvapich2-0.9.8-15
> > 
> >       ofed-1.2.5 mpi-selector
> 
> These two are not in.  The mpi-selector bit won't ever go in.  The mvapich2
> bit
> needs some more work before it's really usable by a distribution.  It might 
> make
> 4.7.  However, OpenMPI is already in and uses the OFED driver stack for
> communications.

OpenMPI doesn't work over iWARP yet.  It doesn't use the rdma-cm so it doesn't
support iWARP.  The only MPI that works over OFED iWARP verbs and the rdma-cm at
this point is mvapich2-0.9.8.  If you don't want to pull it in, then customers
can get it directly from OSU...

Comment 4 Steve Wise 2007-08-21 16:59:50 EDT
All:

I think what we want to do is use ofed-1.2.5 as the place to pull _both_ cxgb3
and iw_cxgb3 code.  This will keep everything in sync with all three components:
cxgb3, iw_cxgb3 and libcxgb3.  This also should enable easy back-porting to
rhel5.2 since ofed has a backport system.  

Basically, we (chelsio/ogc) keep the ofed-1.2.5 drivers up to date by pulling in
all cxgb3 and iw_cxgb3 bug patches that get accepted upstream.   Useing
ofed-1.2.5 gives us a single point of focus.

Does this sound reasonable?  

Comment 5 Andy Gospodarek 2007-08-27 17:00:22 EDT
Steve,

It certainly seems reasonable to do this if you plan to keep the OFED tree in
sync with Jeff's and Linus' trees.  I'm a little bit reluctant to do this since
I don't want to get into a situation where the OFED tree is newer than trees
hosted on kernel.org because changes to drivers there aren't pushed in a timely
manner.

Doug,

What are your thoughts?

Comment 6 Divy Le Ray 2007-09-04 16:23:56 EDT
For tracking purposes:
IBM has posted a similar feature request: #254027.

Divy
Comment 7 Steve Wise 2007-09-04 16:25:18 EDT
By the way, the ofed-1.2.5 git repos maintains all changes to the pertinent ofed
kernel drivers post 2.6.22 as patch files in the kernel_patches/fixes directory.
 So there are patch files for each patch added to ofed on top of its 2.6.22 base.  

This should make it clear exactly what is added to cxgb3 and iw_cxgb3.  And you
can compare these to make sure there is no patches added to ofed that aren't
upstream.

In addition, the ofed git tree keeps a set of backport files in
kernel_addons/backport/<kernel_version/distro>.

And the configure scripts for the tree apply the needed patches based on which
kernel/distro the tree is built against.

Dunno if you guys are familiar with this or not.  So this is FYI.

Steve.
Comment 9 rick bieber 2007-09-19 18:17:36 EDT
Hi, I entered BZ 251025 on 8/6/2007 to get support for the Chelsio adapter in 
RHEL 5.2.  Can someone from Redhat take a look at it and determine if it's a 
duplicate of this issue.

Thanks, Rick Bieber
Comment 10 Andy Gospodarek 2007-09-20 09:57:00 EDT
Yes, it looks like it to me.

Comment 11 Andy Gospodarek 2007-09-20 09:58:03 EDT
*** Bug 251025 has been marked as a duplicate of this bug. ***
Comment 12 Steve Wise 2007-09-28 12:03:34 EDT
All,

The code Chelsio recommends for rhel5u2 is available in the latest ofed-1.2.5
development build.  The distribution tarball is at:

http://www.openfabrics.org/builds/connectx/OFED-1.2.5-20070924-0551.tgz

I don't know how you want to pull in this code, but if you pull the above
tarball, untar it on a rhel5.0 system and build/install it, then you'll get the
recommended code for the chelsio device.  There are src rpms for the ofed kernel
and the ofed user stuff, so you might just need those.  But make sure you
configure the kernel tree to get all the patches applied to the various kernel
modules.  (see comment #7 on how ofed kernel trees work).

For chelsio's rdma functionality to work, you will need at least these libs:
libibverbs, librdmacm, and libcxgb3.

You will also need all the core IB kernel modules and these two chelsio modules:
iw_cxgb3 and cxgb3.

The only MPI that works currently on iWARP/OFED devices is mvapich2.  We
recommend you pull the mvapich2 code from the ofed distro as well and ship it. 
This will enable MPI over Chelsio's RDMA NIC.  It ships in the ofed tarball as a
src rpm.

I'm new to the RedHat processes, so you all might want some other mechanism for
delivery of all this code.  Please let me know.  All of the code in the ofed
distro can be pulled from various git trees if that is a preferred method.  

Thanks,

Steve (swise@chelsio.com)
Comment 13 Andy Gospodarek 2007-09-28 14:23:47 EDT
Thanks, Steve.  I don't know how Doug likes to do things, but I specifically
like to pull from upstream git trees.  Pulling small patches generally makes the
backport easier and less error prone because you can take only the needed
changes a leave in old infrastructure that has changed upstream since the
release of the base kernel.  

I will talk with Doug to see how we want to handle this since right now I mostly
focus on netdrivers and he generally does the OFED backports.  We will either
need to co-ordinate or figure out who is going to be on the hook for the entire
thing.
Comment 15 rick bieber 2007-11-16 11:13:59 EST
Hi, I was just looking over the bugs and I noticed #262241 which is closed 
because OFED 1.2 is already in RHEL 5.1.  So is this BZ for support for the 
Chelsio driver only?  If so can/should we remove "and OFED driver" from the 
name?
Comment 16 Steve Wise 2007-11-26 15:44:18 EST
All,

The cxgb3 driver, iw_cxgb3 driver, and the libcxgb3 library have been updated
with a series of bug fixes.  The cxgb3 driver patches have been submitted and
accepted for upstream.  See:

http://lkml.org/lkml/2007/11/16/224

and

http://lkml.org/lkml/2007/11/23/180

I have also submitted the associated (required) RDMA/iw_cxgb3 change for Roland
to merge.  The submission is here:

http://www.spinics.net/lists/netdev/msg48240.html

I am requesting RH pull all of these fixes and the libcxgb3 change into rh5u2 as
part of this feature.  I will be back-porting all this and including it into
ofed-1.2.5 and ofed-1.3, and I'll update this feature when that effort is
complete (this week).

Thanks,

Steve.




Comment 17 Andy Gospodarek 2007-11-26 16:27:27 EST
Thanks for the info, Steve.  We should be able to accomodate that request.
Comment 18 Divy Le Ray 2007-11-27 16:20:35 EST
Andy, 

I will push another series of patches later this week. These patches will 
extend the initialization for T3C, the latest rev of our chip - explicitely set 
the internal memory parity error detection. It would be worth getting these 
bits in RHEL5u2.

Cheers,
Divy 
Comment 19 Andy Gospodarek 2007-11-27 18:13:56 EST
Thanks for the update, Divy.  I'll be happy to wait for those bits, but they will need to be in before the second week of December or I won't be able to take them.
Comment 20 Bill Hayes 2007-12-11 18:35:52 EST
Divy and Andy,

I saw your patches and it looks like they are in netdev-2.6#upstream?  Does Andy
have what he needs for RHEL 5.2?

Bill

http://marc.info/?l=linux-netdev&m=119687862413438&w=2

List:       linux-netdev
Subject:    [PATCH 0/2] cxgb3 - driver update
From:       Divy Le Ray <divy () chelsio ! com>
Date:       2007-12-05 18:14:53

Jeff,

I'm submitting a patch series for inclusion in 2.6.25.
The patches are built against netdev#upstream.

Here is a brief description:
- Update GPIO pinning and MAC support for T3C adapters
- Enable parity error detection.

Cheers,
Divy
Comment 21 Divy Le Ray 2007-12-11 18:56:32 EST
Hi Bill, 

This series is made of 3 patches actually - I posted follow-up 3/2 patch on 
12/6.
Jeff Garzik attempted to apply them to #upstream-fixes intead of the intended 
#upstream branch, and that failed for patch 2/2 and 3/2.  We're now waiting for 
Jeff to aply them to #upstream.

Divy

 
Comment 22 Bill Hayes 2007-12-18 13:18:26 EST
The T3C changes are now in the Jeff Garzik netdev-2.6 #upstream tree.

commit 1109beac2ef7374ebf216db7a446be77ff77a84e
Author: Divy Le Ray <divy@chelsio.com>
Date:   Mon Dec 17 18:47:41 2007 -0800

    cxgb3 - Fix EEH, missing softirq blocking
    
    set_pci_drvdata() stores a pointer to the adapter,
    not the net device.
    Add missing softirq blocking in t3_mgmt_tx.
    
    Signed-off-by: Divy Le Ray <divy@chelsio.com>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

commit 610d007c6af1d58e0ba364f7296490a5d544e241
Author: Divy Le Ray <divy@chelsio.com>
Date:   Mon Dec 17 18:47:31 2007 -0800

    cxgb3 - parity initialization for T3C adapters.
    
    Add parity initialization for T3C adapters.
    
    Signed-off-by: Divy Le Ray <divy@chelsio.com>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>

commit 3e27775f1d6d45f9d327af1ca827104249e7c601
Merge: 9c8e861... 3fd7131...
Author: Jeff Garzik <jeff@garzik.org>
Date:   Fri Dec 14 17:13:24 2007 -0500

    Merge branch 'upstream-fixes' into upstream

commit 75758e8aa4b7d5c651261ce653dd8d0b716e1eda
Author: Divy Le Ray <divy@chelsio.com>
Date:   Wed Dec 5 10:15:01 2007 -0800

    cxgb3 - T3C support update
    
    Update GPIO mapping for T3C.
    Update xgmac for T3C support.
    Fix typo in mtu table.
    
    Signed-off-by: Divy Le Ray <divy@chelsio.com>
    Signed-off-by: Jeff Garzik <jeff@garzik.org>
Comment 26 Andy Gospodarek 2007-12-19 14:15:17 EST
Divy, for some reason those changes aren't showing up in my local copy (and I've
pulled every which way imaginable).  Hopefully they will be there soon...
Comment 27 Andy Gospodarek 2007-12-19 14:18:34 EST
spoke too soon -- got 'em now....
Comment 28 Divy Le Ray 2007-12-19 14:27:19 EST
Andy, 

QA here has started your latest rpms. I haven't look yet at he code.
When do you think you will have the latest patches in ? 

Thanks,
Divy
Comment 29 Andy Gospodarek 2007-12-19 14:51:55 EST
I should have the patches in a few minutes and build in 3-4 hours.
Comment 30 Andy Gospodarek 2007-12-19 15:17:58 EST
Created attachment 290054 [details]
cxgb3-rhel5-test5.patch

patch I plan to integrate into my test builds
Comment 31 Andy Gospodarek 2007-12-19 15:36:19 EST
Created attachment 290063 [details]
cxgb3-rhel5-test6.patch

drop the old one in favor of this one -- forgot 2 small bits that are needed
Comment 34 Divy Le Ray 2008-01-09 20:27:52 EST
Andy, 

Our testing looks good so far.
Comment 35 Divy Le Ray 2008-01-16 19:36:14 EST
Hi Andy, 

will your patch be integrated in RHEL5.2. This is our goal and expectation. 
I'm getting confused with the activity #253195.

Cheers,
Divy
Comment 36 Don Zickus 2008-01-24 16:15:46 EST
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 38 Divy Le Ray 2008-02-14 14:16:19 EST
Andy, Doug,  

Steve Wise posted 2 patches that have been committed:
[patch 1]
http://git.kernel.org/?p=linux/kernel/git/jgarzik/netdev-
2.6.git;a=commit;h=4eb61e0231be536d8116457b67b3e447bbd510dc
cxgb3: Handle ARP completions that mark neighbors stale.

When ARP completes due to a request rather than a reply the neighbor is
marked NUD_STALE instead of reachable (see arp_process()).  The handler
for the resulting netevent needs to check also for NUD_STALE.

Failure to use the arp entry can cause RDMA connection failures.

[patch 2]
http://git.kernel.org/?
p=linux/kernel/git/roland/infiniband.git;a=commitdiff;h=8704e9a8790cc9e394198663
c1c9150c899fb9a2
The cxgb3 HW and driver don't support loopback RDMA connections.  So
fail any connection attempt where the destination address is local.

Patch1 is a must have. Without it, mpi clusters can have startup problems.
Patch2 avoids a crash when lookpack connections are attempted.

We would like to see these patches included in RHEL5.2. Please let us know of 
any concern. 

Cheers,
Divy
Comment 39 Andy Gospodarek 2008-02-14 14:28:48 EST
Divy, I can try to get this in, but it might be a problem.  If we cannot get it
into 5.2 we can get it in an errata kernel shortly after.
Comment 40 Steve Wise 2008-02-27 13:52:01 EST
All:

The mvapich2 library from OFED-1.3 is required for MPI over iWARP at this point.
 Open MPI is not yet enabled over iWARP in general.  The openib btl doesn't use
the rdma-cm, which is a requirement for iWARP, and the uDAPL transport also
hasn't been tweaked to work correctly over iWARP.

This was listed as a dependency in the original opening text for this feature.

Is it possible for you to ship mvapich2-1.0.2 from OFED-1.3?

Thanks,

Steve.
Comment 42 Andy Gospodarek 2008-02-28 22:18:16 EST
My test kernels have been updated to include a patch for this bugzilla.

http://people.redhat.com/agospoda/#rhel5

Please test them and report back your results.
Comment 46 Divy Le Ray 2008-03-11 00:35:39 EDT
Andy,

Our QA team has successfully tested your test kernel.

Cheers,
Divy
Comment 47 Don Zickus 2008-03-12 15:40:45 EDT
in kernel-$NEW_VER
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 48 Don Zickus 2008-03-12 15:59:18 EDT
in kernel-2.6.18-85.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5
Comment 50 John Poelstra 2008-04-02 17:35:35 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot3--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 51 John Poelstra 2008-04-09 18:42:29 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot4--available now on partners.redhat.com.  

Please test and confirm that your issue is fixed.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 52 John Poelstra 2008-04-23 13:40:24 EDT
Greetings Red Hat Partner,

A fix for this issue should be included in the latest packages contained in
RHEL5.2-Snapshot6--available now on partners.redhat.com.  

We are nearing GA for 5.2 so please test and confirm that your issue is fixed ASAP.

After you (Red Hat Partner) have verified that this issue has been addressed,
please perform the following:
1) Change the *status* of this bug to VERIFIED.
2) Add *keyword* of PartnerVerified (leaving the existing keywords unmodified)

If this issue is not fixed, please add a comment describing the most recent
symptoms of the problem you are having and change the status of the bug to ASSIGNED.

If you are receiving this message in Issue Tracker, please reply with a message
to Issue Tracker about your results and I will update bugzilla for you.  If you
need assistance accessing ftp://partners.redhat.com, please contact your Partner
Manager.

Thank you
Comment 53 Divy Le Ray 2008-05-01 01:48:10 EDT
All the requested bits are in. Thanks a lot for the work!

Cheers,
Divy
Comment 55 errata-xmlrpc 2008-05-21 10:52:56 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on the solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0314.html

Note You need to log in before you can comment on or make changes to this bug.