Bug 1280025

Summary: Low performance on bond with QLogic 10GbE
Product: Red Hat Enterprise Virtualization Manager Reporter: Amador Pahim <asegundo>
Component: rhev-hypervisorAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED ERRATA QA Contact: Huijuan Zhao <huzhao>
Severity: medium Docs Contact:
Priority: high    
Version: 3.5.4CC: asegundo, bgraveno, cwu, fdeutsch, gklein, huzhao, jarod, lsurette, mgoldboi, pstehlik, tlitovsk, ycui, yeylon, ykaul, zhchen
Target Milestone: ovirt-3.5.8Keywords: TestOnly, ZStream
Target Release: 3.5.8   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-29 14:17:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1287993    
Bug Blocks:    
Attachments:
Description Flags
sosreport and ethtool output none

Description Amador Pahim 2015-11-10 18:33:12 UTC
Description of problem:

After the node upgrade to version "RHEV Hypervisor - 6.7 - 20150828.0.el6ev" or "RHEV Hypervisor - 6.7 - 20151015.1.el6ev", the bond with 2 interfaces 10Gbps CNA (QLE8262 - QLogic ISP8214 10GbE Controller), we started to face network performance issue. The throughput is around 400 Kbps. This host has around 65 virtual machines.

Version-Release number of selected component (if applicable):

RHEV Hypervisor - 6.7 - 20150828.0.el6ev
RHEV Hypervisor - 6.7 - 20151015.1.el6ev

Additional info:

the problem is related to the firmware of the CNA board and the use of Large Receive Offload (LRO). The workaround is to turn off support for LRO in each host network interface.

# ethtool -K <interace> lro off (to disable)
# ethtool -k <interace> (to check)

During the issue we can see the messages bellow in dmesg:

Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: unexpected GSO type: 0x0, gso# ethtool -k <interace>_size 60234, hdr_len 1514
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 1a 4a 19 00 1a 4a 19 37 02 00 40 c7 bf  ....J...J.7..@..
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 0d bb 86 dd 60 00 00 00 05 b4 06 40 28 01 00 b6  ....`......@(...
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: unexpected GSO type: 0x0, gso_size 60234, hdr_len 1514
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 1a 4a 19 00 1a 4a 19 37 02 00 40 c7 bf  ....J...J.7..@..
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 0d bb 86 dd 60 00 00 00 05 b4 06 40 28 01 00 b6  ....`......@(...

I noticed that for the previous versions, LRO is default disabled.

Comment 1 Fabian Deutsch 2015-11-11 07:46:31 UTC
I see that you already identified the issue to be related to the firmware.

Is there an updated firmware which is fixing this issue?

Or is your expectation that LRO should be disabled again by default?

Comment 5 Fabian Deutsch 2015-11-11 09:25:51 UTC
To verify that we are chasing the right bug, Amador:
From the description it is not clear where you exactly see the performance degradation.
Are you seeing it on the bonded and bridged device which provides networks to VMs? If so then it looks like bug 1266366.

Can you confirm this?

Comment 6 Amador Pahim 2015-11-11 11:07:13 UTC
(In reply to Fabian Deutsch from comment #5)
> To verify that we are chasing the right bug, Amador:
> From the description it is not clear where you exactly see the performance
> degradation.
> Are you seeing it on the bonded and bridged device which provides networks
> to VMs? If so then it looks like bug 1266366.
> 
> Can you confirm this?

Yes, bond + bridge providing network to VMs.

Thank you.

Comment 13 Chaofeng Wu 2015-11-13 07:47:04 UTC
Created attachment 1093562 [details]
sosreport and ethtool output

Comment 14 Chaofeng Wu 2015-11-13 08:19:22 UTC
Hi Amador,

Here are one thing need confirm with customer:
1. What the _*exact previous rhevh version*_ the customer was using before, confirm this can help us to reproduce this issue exactly.

Thanks

Comment 15 Amador Pahim 2015-11-13 15:39:07 UTC
(In reply to Chaofeng Wu from comment #14)
> Hi Amador,
> 
> Here are one thing need confirm with customer:
> 1. What the _*exact previous rhevh version*_ the customer was using before,
> confirm this can help us to reproduce this issue exactly.
> 
> Thanks

RHEV Hypervisor - 6.5 - 20140930.1.el6ev

Comment 24 Fabian Deutsch 2015-11-17 07:05:41 UTC
According to kernel dev and QE, this is a duplicate of bug 1259008. Once that bug is fixed in RHEL, we'll also get the update in RHEV-H.

Comment 30 Huijuan Zhao 2016-02-16 11:04:21 UTC
According to Comment 24, we will verify the kernel version and do sanity test in RHEV-H.

Test version:
RHEV-H 6.7-20160215.2.el6ev
ovirt-node-3.2.3-31.el6.noarch
kernel-2.6.32-573.18.1.el6.x86_64

Sanity test:
1.TUI install RHEV-H 6.7-20160215.2 successful
2.Upgrade from RHEV-H 6.7-20160128.0 to RHEV-H 6.7-20160215.2 with bond network configured successful

So the kernel version is right and sanity test pass in RHEV-H 6.7-20160215.2.el6ev, I will change the status to verified.

Comment 32 errata-xmlrpc 2016-02-29 14:17:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0312.html