Bug 557109 - [5.4] VLAN performance issue with 10gbE Mellanox NICs
Summary: [5.4] VLAN performance issue with 10gbE Mellanox NICs
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: kernel
Version: 5.4
Hardware: All
OS: Linux
high
high
Target Milestone: rc
: ---
Assignee: Doug Ledford
QA Contact: Red Hat Kernel QE team
URL:
Whiteboard:
Depends On:
Blocks: 573098
TreeView+ depends on / blocked
 
Reported: 2010-01-20 13:09 UTC by Flavio Leitner
Modified: 2023-09-14 01:20 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 573098 (view as bug list)
Environment:
Last Closed: 2010-03-30 07:14:09 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
0001-Propagate-selected-feature-bits-to-VLAN-devices.patch (4.06 KB, patch)
2010-01-20 13:10 UTC, Flavio Leitner
no flags Details | Diff
0002-vlan-Add-ethtool-support.patch (2.00 KB, patch)
2010-01-20 13:11 UTC, Flavio Leitner
no flags Details | Diff
0003-mlx4_en-Added-vlan_features-support.patch (1.19 KB, patch)
2010-01-20 13:12 UTC, Flavio Leitner
no flags Details | Diff
screenshot of the performance results (35.11 KB, application/x-bzip2)
2010-01-20 13:24 UTC, Flavio Leitner
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2010:0178 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 5.5 kernel security and bug fix update 2010-03-29 12:18:21 UTC

Description Flavio Leitner 2010-01-20 13:09:17 UTC
Description of problem:
During the course of testing with Mellanox 10GbE NICs, customer discovered a
major issue while testing with vlan's. The parent NIC driver cannot perform
tcp segmentation offloads to the vlan driver included as part of RHEL, and
the line speed drops down to less than 1GB.

The request here is that the vlan performs as well as the native driver and
Mellanox has stated that they can get this, as long as all available features
are propagated. TSO and CSUM have been propagated (backporting 3 patches), 
and while they see an improvement, they don't see the same performance. 
Scatter Gather IO has not been backported yet.

These are the upstream commits backported so far:
commit 75b8846acd11ad3fc736d4df3413fe946bbf367c
Author: Patrick McHardy <kaber>
Date:   Tue Jul 8 03:22:42 2008 -0700

    vlan: Add ethtool support

    Add ethtool support for querying the device for offload settings.

commit 5fb13570543f4ae022996c9d7c0c099c8abf22dd
Author: Patrick McHardy <kaber>
Date:   Tue May 20 14:54:50 2008 -0700

    [VLAN]: Propagate selected feature bits to VLAN devices

    Propagate feature bits from the NETDEV_FEAT_CHANGE notifier. For now
    only TSO is propagated for devices that announce their ability to
    support TSO in combination with VLAN accel by setting the
NETIF_F_VLAN_TSO
    flag.
commit 289c79a4bd350e8a25065102563ad1a183d1b402
Author: Patrick McHardy <kaber>
Date:   Fri May 23 00:22:04 2008 -0700

    vlan: Use bitmask of feature flags instead of seperate feature bits

    Herbert Xu points out that the use of seperate feature bits for
features
    to be propagated to VLAN devices is going to get messy real soon.
    Replace the VLAN feature bits by a bitmask of feature flags to be
    propagated and restore the old GSO_SHIFT/MASK values.


The last one breaks kABI, so it deserves more work on it.

Additional info:
They also tested with another vendor's NIC that is TOE capable, it is *not*
susceptible to the same performance issues. Perhaps TOE driver is performing
some of the functionality that vlan driver is meant to do.  Unfortunately,
the Mellanox NICs do not offer TOE and are the only KR NICs available in
blade form factor.

Comment 1 Flavio Leitner 2010-01-20 13:10:41 UTC
Created attachment 385670 [details]
0001-Propagate-selected-feature-bits-to-VLAN-devices.patch

Comment 2 Flavio Leitner 2010-01-20 13:11:10 UTC
Created attachment 385671 [details]
0002-vlan-Add-ethtool-support.patch

Comment 3 Flavio Leitner 2010-01-20 13:12:48 UTC
Created attachment 385672 [details]
0003-mlx4_en-Added-vlan_features-support.patch

Comment 4 Flavio Leitner 2010-01-20 13:24:18 UTC
Created attachment 385673 [details]
screenshot of the performance results

Comment 5 Flavio Leitner 2010-01-20 13:27:33 UTC
Have you tested upstream kernel? Does it have the desired performance?

Comment 6 James M. Leddy 2010-01-20 14:17:24 UTC
(In reply to comment #5)
> Have you tested upstream kernel? Does it have the desired performance?    

Yevgeny from Mellanox has tried and said that .32 performance was good.

Comment 12 RHEL Program Management 2010-02-23 18:45:18 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 15 Chris Ward 2010-02-25 14:49:08 UTC
@Mellanox, 

We need to confirm that if we accept this updated patch set into the release, you will be able to provide us with a quick turnaround on testing so we know whether it properly addresses the issues as reported.

Comment 20 Jarod Wilson 2010-03-03 15:44:09 UTC
in kernel-2.6.18-191.el5
You can download this test kernel from http://people.redhat.com/jwilson/el5

Please update the appropriate value in the Verified field
(cf_verified) to indicate this fix has been successfully
verified. Include a comment with verification details.

Comment 27 errata-xmlrpc 2010-03-30 07:14:09 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHSA-2010-0178.html

Comment 29 Red Hat Bugzilla 2023-09-14 01:20:00 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days


Note You need to log in before you can comment on or make changes to this bug.