Bug 1280025 - Low performance on bond with QLogic 10GbE
Low performance on bond with QLogic 10GbE
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: rhev-hypervisor (Show other bugs)
3.5.4
All Linux
high Severity medium
: ovirt-3.5.8
: 3.5.8
Assigned To: Fabian Deutsch
Huijuan Zhao
: TestOnly, ZStream
Depends On: 1287993
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-10 13:33 EST by Amador Pahim
Modified: 2016-08-29 08:13 EDT (History)
15 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2016-02-29 09:17:48 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
sosreport and ethtool output (7.68 MB, application/x-gzip)
2015-11-13 02:47 EST, Chaofeng Wu
no flags Details

  None (edit)
Description Amador Pahim 2015-11-10 13:33:12 EST
Description of problem:

After the node upgrade to version "RHEV Hypervisor - 6.7 - 20150828.0.el6ev" or "RHEV Hypervisor - 6.7 - 20151015.1.el6ev", the bond with 2 interfaces 10Gbps CNA (QLE8262 - QLogic ISP8214 10GbE Controller), we started to face network performance issue. The throughput is around 400 Kbps. This host has around 65 virtual machines.

Version-Release number of selected component (if applicable):

RHEV Hypervisor - 6.7 - 20150828.0.el6ev
RHEV Hypervisor - 6.7 - 20151015.1.el6ev

Additional info:

the problem is related to the firmware of the CNA board and the use of Large Receive Offload (LRO). The workaround is to turn off support for LRO in each host network interface.

# ethtool -K <interace> lro off (to disable)
# ethtool -k <interace> (to check)

During the issue we can see the messages bellow in dmesg:

Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: unexpected GSO type: 0x0, gso# ethtool -k <interace>_size 60234, hdr_len 1514
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 1a 4a 19 00 1a 4a 19 37 02 00 40 c7 bf  ....J...J.7..@..
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 0d bb 86 dd 60 00 00 00 05 b4 06 40 28 01 00 b6  ....`......@(...
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: unexpected GSO type: 0x0, gso_size 60234, hdr_len 1514
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 00 00 00 1a 4a 19 00 1a 4a 19 37 02 00 40 c7 bf  ....J...J.7..@..
Oct 30 17:14:21 ptihost-rhev-proj01 kernel: tun: 0d bb 86 dd 60 00 00 00 05 b4 06 40 28 01 00 b6  ....`......@(...

I noticed that for the previous versions, LRO is default disabled.
Comment 1 Fabian Deutsch 2015-11-11 02:46:31 EST
I see that you already identified the issue to be related to the firmware.

Is there an updated firmware which is fixing this issue?

Or is your expectation that LRO should be disabled again by default?
Comment 5 Fabian Deutsch 2015-11-11 04:25:51 EST
To verify that we are chasing the right bug, Amador:
From the description it is not clear where you exactly see the performance degradation.
Are you seeing it on the bonded and bridged device which provides networks to VMs? If so then it looks like bug 1266366.

Can you confirm this?
Comment 6 Amador Pahim 2015-11-11 06:07:13 EST
(In reply to Fabian Deutsch from comment #5)
> To verify that we are chasing the right bug, Amador:
> From the description it is not clear where you exactly see the performance
> degradation.
> Are you seeing it on the bonded and bridged device which provides networks
> to VMs? If so then it looks like bug 1266366.
> 
> Can you confirm this?

Yes, bond + bridge providing network to VMs.

Thank you.
Comment 13 Chaofeng Wu 2015-11-13 02:47 EST
Created attachment 1093562 [details]
sosreport and ethtool output
Comment 14 Chaofeng Wu 2015-11-13 03:19:22 EST
Hi Amador,

Here are one thing need confirm with customer:
1. What the _*exact previous rhevh version*_ the customer was using before, confirm this can help us to reproduce this issue exactly.

Thanks
Comment 15 Amador Pahim 2015-11-13 10:39:07 EST
(In reply to Chaofeng Wu from comment #14)
> Hi Amador,
> 
> Here are one thing need confirm with customer:
> 1. What the _*exact previous rhevh version*_ the customer was using before,
> confirm this can help us to reproduce this issue exactly.
> 
> Thanks

RHEV Hypervisor - 6.5 - 20140930.1.el6ev
Comment 24 Fabian Deutsch 2015-11-17 02:05:41 EST
According to kernel dev and QE, this is a duplicate of bug 1259008. Once that bug is fixed in RHEL, we'll also get the update in RHEV-H.
Comment 30 Huijuan Zhao 2016-02-16 06:04:21 EST
According to Comment 24, we will verify the kernel version and do sanity test in RHEV-H.

Test version:
RHEV-H 6.7-20160215.2.el6ev
ovirt-node-3.2.3-31.el6.noarch
kernel-2.6.32-573.18.1.el6.x86_64

Sanity test:
1.TUI install RHEV-H 6.7-20160215.2 successful
2.Upgrade from RHEV-H 6.7-20160128.0 to RHEV-H 6.7-20160215.2 with bond network configured successful

So the kernel version is right and sanity test pass in RHEV-H 6.7-20160215.2.el6ev, I will change the status to verified.
Comment 32 errata-xmlrpc 2016-02-29 09:17:48 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHEA-2016-0312.html

Note You need to log in before you can comment on or make changes to this bug.