Bug 534158

Summary: Updates for mlx4 drivers
Product: Red Hat Enterprise Linux 5 Reporter: Yevgeny Petrilin <yevgenyp>
Component: kernelAssignee: Doug Ledford <dledford>
Status: CLOSED ERRATA QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: high Docs Contact:
Priority: low    
Version: 5.5CC: cward, dledford, erezsh, goetz.waschk, linuxram, peterm, stephan.wiesand, tziporet, yevgenyp
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard: Updates for mlx4 drivers
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2010-03-30 07:41:34 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
mlx4_patches
none
SRIOV fixes for mlx4_core and mlx4_en drivers
none
1/5 patch: Retry when pci resource allocation fails.
none
patch 2/5: resource alignment management
none
patch 3/5: sriov resource alignment fix
none
patch 4/5: track sriov resources through a IORESOURCE_SRIOV flag
none
patch 5/5: fixes a pci resource allocation bug
none
a corrected version of 4/5 patch. It fixes a issue seen during rmmod/insmod of the driver none

Description Yevgeny Petrilin 2009-11-10 18:44:15 UTC
Hello I am attaching a tarball that contains patches for mlx4 drivers (mlx4_core and mlx4_en) that were created against kernel 2.6.18-172
The main changes:
-Additional Ethtool support (self diagnostics test)
-Bug fixes
-Performance improvements
-Giving interface name in driver prints
-Have a separate file for Ethtool functionality
-SRIOV support

Comment 1 Yevgeny Petrilin 2009-11-10 19:02:09 UTC
Created attachment 368458 [details]
mlx4_patches

Comment 2 Tziporet Koren 2009-12-08 20:51:57 UTC
Any update on this?
Thanks
Tziporet

Comment 3 Doug Ledford 2009-12-08 20:59:12 UTC
The bug is in post state, which implicitly means that the code has been submitted internally for review and inclusion in the next kernel release.

Comment 4 Don Zickus 2009-12-15 20:19:35 UTC
in kernel-2.6.18-181.el5
You can download this test kernel from http://people.redhat.com/dzickus/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 5 RHEL Program Management 2009-12-15 23:12:18 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 6 Chris Ward 2009-12-16 11:04:15 UTC
@Mellanox

We would like to confirm that you can provide commitment to test 
for the resolution of this request during the RHEL 5.5 Beta 
Test Phase before we approve it for acceptance into the release.

It appears that this request covers areas we are unable to 
sufficiently test in-house.

RHEL 5.5 Beta Test Phase is expected to begin around February
2010.

In order to avoid any unnecessary delays, please post a 
confirmation as soon as possible, including the contact 
information for testing engineers.

Any additional information about alternative testing variations we 
could use to reproduce this issue in-house would be appreciated.

Comment 7 Yevgeny Petrilin 2009-12-16 11:22:49 UTC
We most definitely plan to test all the new code in house.
We already do it the patches we sent you with the Updated RH5.5 kernels.
We have some fixes that we plan on sending in the near week.
we will post the fixes to this bug.

The engineers working on this in Mellanox are:
Yevgeny Petrilin (yevgenyp.il)
and Erez Shitrit (erezsh.il)

We will need additional contacts if needed.

Comment 10 Yevgeny Petrilin 2010-02-07 08:24:29 UTC
Created attachment 389347 [details]
SRIOV fixes for mlx4_core and mlx4_en drivers

Comment 11 Yevgeny Petrilin 2010-02-07 08:25:04 UTC
Hello,
I have attached a patch created on top of 2.6.18-182 kernel with fixes to our SRIOV code.
This code is being tested and verified inside Mellanox, and we will send fixes if there are more issues found.
The attached fixes were also sent to upstream kernel and currently reviewed by Roland Dreier : 
http://marc.info/?l=linux-netdev&m=126529887700754&w=2

Comment 12 Chris Ward 2010-02-11 10:11:26 UTC
~~ Attention Customers and Partners - RHEL 5.5 Beta is now available on RHN ~~

RHEL 5.5 Beta has been released! There should be a fix present in this 
release that addresses your request. Please test and report back results 
here, by March 3rd 2010 (2010-03-03) or sooner.

Upon successful verification of this request, post your results and update 
the Verified field in Bugzilla with the appropriate value.

If you encounter any issues while testing, please describe them and set 
this bug into NEED_INFO. If you encounter new defects or have additional 
patch(es) to request for inclusion, please clone this bug per each request
and escalate through your support representative.

Comment 13 Yevgeny Petrilin 2010-02-21 14:11:44 UTC
The patches from Comment 11 are not part of this Beta kernel, so at the moment the driver miss-functions.
Can these fixes be still done?
Thanks,
Yevgeny

Comment 14 Yevgeny Petrilin 2010-02-24 17:44:34 UTC
I just checked the 2.6.18-190 kernel, the mlx4 driver is buggy!
The attached fixes need to be accepted to fix it.

Comment 16 Doug Ledford 2010-02-25 15:26:42 UTC
Putting back into ON_QA as the follow on patch in comment #11 is being tracked in bug #567730

Comment 19 errata-xmlrpc 2010-03-30 07:41:34 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 20 Ram Pai 2010-04-06 08:54:03 UTC
Created attachment 404655 [details]
1/5 patch:  Retry when pci resource allocation fails.

Comment 21 Ram Pai 2010-04-06 08:54:55 UTC
Created attachment 404656 [details]
patch 2/5: resource alignment management

Comment 22 Ram Pai 2010-04-06 08:56:12 UTC
Created attachment 404657 [details]
patch 3/5: sriov resource alignment fix

Comment 23 Ram Pai 2010-04-06 08:57:13 UTC
Created attachment 404658 [details]
patch 4/5:  track sriov resources through a IORESOURCE_SRIOV flag

Comment 24 Ram Pai 2010-04-06 08:58:50 UTC
Created attachment 404659 [details]
patch 5/5: fixes a pci resource allocation bug

Comment 25 Ram Pai 2010-04-06 09:01:40 UTC
the above 5 patches enable SRIOV for mellanox, intel 1g, intel 10g. Have touch tested the code. More testing is needed. Meanwhile any comments/feedback is appreciated.

Comment 26 Ram Pai 2010-04-13 00:12:59 UTC
Created attachment 406119 [details]
a corrected version of 4/5 patch. It fixes a issue seen during rmmod/insmod of the driver

a corrected version of 4/5 patch. It fixes a issue seen during rmmod/insmod of the driver