RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 772806 - Disable LRO for all NICs that have LRO enabled
Summary: Disable LRO for all NICs that have LRO enabled
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ovirt-node
Version: 6.3
Hardware: All
OS: Linux
urgent
urgent
Target Milestone: rc
: ---
Assignee: Mike Burns
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks: 772809 773675
TreeView+ depends on / blocked
 
Reported: 2012-01-10 01:47 UTC by Mike Burns
Modified: 2016-04-26 13:55 UTC (History)
29 users (show)

Fixed In Version: ovirt-node-2.2.1-1.el6
Doc Type: Bug Fix
Doc Text:
Clone Of: 772317
Environment:
Last Closed: 2012-07-19 14:17:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2012:0741 0 normal SHIPPED_LIVE ovirt-node bug fix and enhancement update 2012-07-19 18:10:46 UTC

Description Mike Burns 2012-01-10 01:47:10 UTC
ovirt-node should disable LRO for all nics that have it enabled by default.  This workaround will be removed later when 772317 is fixed.



+++ This bug was initially created as a clone of Bug #772317 +++

There are significant performance issues reported for NICs that use LRO.  We need to disable LRO for all nics that have it enabled.

--- Additional comment from mburns on 2012-01-09 13:58:08 EST ---

Original mail comments:

So, running RHEV 3 beta for a customer this week and we've been seeing 
horrible performance on the RHEV-H hosts running the bnx2x driver. It 
turns out this is a problem with LRO. So we have a fix and it works 
(ethtool -K eth$ lro off).

However how do we make this change persistent across reboots ? We want 
to verify that the "normal" method of putting the appropriate option 
(options bnx2x disable_tpa=1) in modprobe.conf is supported. (There is 
no /etc/modprobe.conf and / is ro ... ).

--- Additional comment from mburns on 2012-01-09 13:59:56 EST ---

Applying the workaround is doable by placing commands in /etc/rc.local and persisting.  

This issue originally came up in 5.6/5.7 but was supposed to be fixed in the kernel.  Can I get some help from the kernel team with debugging/triaging this problem?

--- Additional comment from nhorman on 2012-01-09 15:56:50 EST ---

Mike, can you tell me:

1) What the environment looks like?  Specifically what kind of network interfaces are in play here?  Specific effected drivers, vlans in use, bridges in use vs. sriov or other offload technologies?

2) The specific nature of the failure.  Are frames getting dropped, and if so, where?  Specific netstat, ethtool, and /proc/net/dev|snmp stats are useful here

3) History.  You said this came up in 5.6/5.7. Is the problem fixed there, or does it persist there the same way it does in RHEL6?

--- Additional comment from mburns on 2012-01-09 16:08:40 EST ---

Paul,  Can you provide the information for 1 and 2 above?? 

Neil,

In 5.6/5.7, we explicitly disabled LRO on all nics where it was enabled by default.  The rhev-hypervisor bug (bug 696374) mentioned bug 696374.  I don't know if this partitcular environment has vlans or not though.  In the 5.7/5.8 branches, we still have that workaround in place, but it was never ported forward to the RHEL 6 stream.

--- Additional comment from plundin on 2012-01-09 16:48:36 EST ---

In response to the above:

1. A single RHEV-M instance managing a cluster of 6 HP nodes running RHEV-H, all using the bnx2x driver (as is normal with HP kit). No tagging, STP or SRIOV in use. Interfaces were however mode 1 bonded (active/failover) pairs.

2. It appeared to mimic a bug I found online when debugging the issue (duplicate responses/acks), but truthfully we were under the gun and did not save the tcpdump output. No errors or collisions shown on the interfaces, and everything else was defaults (eg nothing fancy here). 

Upon making the above LRO change network speeds increased significantly. The specific test use case was kickstarting VM's over the network. A base RHEL install took over 4 hours (as the only VM running on the hypervisor) before disabling LRO. Once LRO was disabled in the hypervisor the install took less than 5 minutes. (Not scientific, but it pointed us where we needed to go)

--- Additional comment from nhorman on 2012-01-09 16:54:39 EST ---

Thank you Mike, if you could also provide some details as to what exactly needed to be fixed in RHEL5 so we can compare to RHEL6.  IIRC the only thing that had to be done in RHEL5 was the disabling of lro automatically when a device was added to a bridge.  That functionality should already be in RHEL6. If you are using some offload technology like sriov or some other pci virtual function technology, manual lro disabling (or some other per-device-driver automatic disabling is still going to be required).

--- Additional comment from mburns on 2012-01-09 17:05:19 EST ---

The fix in RHEL5 was to simply disable LRO in all instances on all nics that supported it.  It was a hack and workaround, but was sufficient for our use.  

There should be no sr-iov or anything like that in this situation.  

My recollection of the issue was the same.  We needed to disable lro when adding the nic to a bridge.  Based on what Risar is saying, this wasn't happening for them.  The nic was added to a bridge, but they were still seeing problems until they explicitly disabled LRO on that interface.

--- Additional comment from nhorman on 2012-01-09 17:05:50 EST ---

Paul, thank you.  so it sounds like no vlans are in use, which is good.  That confirms that this is no relation to the vlan lro bug I fixed in RHEL5.  That said, if you're using bonding, then I think thats where the problem lies.  I don't see any way that the bonding driver can disable slave lro at the moment, or for that matter, tell its slaves to do so.  Can we test this theory.  Does the problem go away if you stop using the bond? If you attach a single interface to your bridge, does lro get disabled, and does your performance increase?

Mike, I can take this bug over if you like.

--- Additional comment from agospoda on 2012-01-09 17:15:58 EST ---

I suspect Neil is correct on this one.  The bonding driver does not have a set_flags ethtool op and this would be required to pass down the need to disable LRO on all slave devices.

--- Additional comment from plundin on 2012-01-09 17:19:23 EST ---

Neil, I can ask the customer if they are willing to test this (The problem was encountered during a RH Consulting engagement which ended last week) but it may be a few days until they get a chance to do so.

Comment 1 Mike Burns 2012-01-10 01:51:25 UTC
Patch is posted upstream:

http://gerrit.ovirt.org/927

Comment 3 Mike Burns 2012-01-12 17:41:39 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
A serious performance problem occurs when running using a bond and a bridge on top of NICs that use LRO.  LRO should get disabled automatically when the NIC is added to a bridge but this doesn't work right when there is a bond in between.  This patch disables LRO on all nics.

Comment 5 Tim Hildred 2012-01-25 02:22:13 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1 +1,3 @@
-A serious performance problem occurs when running using a bond and a bridge on top of NICs that use LRO.  LRO should get disabled automatically when the NIC is added to a bridge but this doesn't work right when there is a bond in between.  This patch disables LRO on all nics.+Previously, using network interfaces in a bond and a bridge prevented LRO from being disabled on LRO-enabled network interface cards, causing serious network performance issues. 
+
+Now, LRO is disabled on all hypervisor network interface cards, preventing any LRO related network performance issues from occurring.

Comment 6 cshao 2012-02-24 09:19:37 UTC
Test version: 
rhev-hypervisor6-6.3-20120215.0.el6

# cat mlx4_en.conf 
options mlx4_en num_lro=0

# cat enic.conf 
options enic lro_disable=1

# cat s2io.conf 
options s2io lro=0

# cat bnx2x.conf 
options bnx2x disable_tpa=1

The bug is fixed, so change bug status to VERIFIED.

Comment 7 cshao 2012-02-27 08:27:19 UTC
As Mike's confirmation on bug 773675 #11 and #12, so I just check the configuration file for this bug on 6.3 build. 

Hi Mike,
Is it sufficient to verify this bug on our side? or need zstream to verify it?

Comment 9 Stephen Gordon 2012-03-27 20:57:52 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,3 @@
 Previously, using network interfaces in a bond and a bridge prevented LRO from being disabled on LRO-enabled network interface cards, causing serious network performance issues. 
 
-Now, LRO is disabled on all hypervisor network interface cards, preventing any LRO related network performance issues from occurring.+Now, LRO is disabled on all Hypervisor network interface cards, avoiding LRO related network performance issues.

Comment 10 Stephen Gordon 2012-05-28 16:27:34 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,3 +1,3 @@
-Previously, using network interfaces in a bond and a bridge prevented LRO from being disabled on LRO-enabled network interface cards, causing serious network performance issues. 
+Previously, using network interfaces in both a bond and a bridge prevented LRO from being disabled on LRO-enabled network interface cards, causing serious network performance issues. 
 
 Now, LRO is disabled on all Hypervisor network interface cards, avoiding LRO related network performance issues.

Comment 11 Stephen Gordon 2012-05-28 16:29:00 UTC
Removing the technical note flag given that the next on my list was Bug # 772809 which appears to revert this change...

Comment 12 Stephen Gordon 2012-06-15 13:58:15 UTC
Deleted Technical Notes Contents.

Old Contents:
Previously, using network interfaces in both a bond and a bridge prevented LRO from being disabled on LRO-enabled network interface cards, causing serious network performance issues. 

Now, LRO is disabled on all Hypervisor network interface cards, avoiding LRO related network performance issues.

Comment 14 errata-xmlrpc 2012-07-19 14:17:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2012-0741.html


Note You need to log in before you can comment on or make changes to this bug.