1005804 – modem-like speed when transmitting TCP to a floating IP

Bug 1005804 - modem-like speed when transmitting TCP to a floating IP

Summary: modem-like speed when transmitting TCP to a floating IP

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	kernel
Sub Component:
Version:	3.0
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	z3
Target Release:	3.0
Assignee:	Radomir Vrbovsky
QA Contact:	Jean-Tsung Hsiao
Docs Contact:
URL:
Whiteboard:
Depends On:	997632 1091629 1132588
Blocks:	1091627
TreeView+	depends on / blocked

Reported:	2013-09-09 12:25 UTC by Jaroslav Henner
Modified:	2016-09-06 08:39 UTC (History)
CC List:	10 users (show)
Fixed In Version:	kernel-2.6.32-358.123.3.openstack.el6
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1091627 (view as bug list)
Environment:
Last Closed:	2013-11-14 17:41:57 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2013:1520	0	normal	SHIPPED_LIVE	Moderate: kernel security, bug fix, and enhancement update	2013-11-14 22:40:03 UTC

Description Jaroslav Henner 2013-09-09 12:25:46 UTC

Description of problem:
We have experienced a regression of TCP speed when downloading some data to our instances. After some investigation I found that switching back to kernel 114 from 118 on the controller helped.

I am not sure whether it may be related, but all of our HW machines have nics:
Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20).

the controller runs as a libvirt VM on same HW as is used for compute nodes, but as VM, virtio interfaces, bridged with the real HW ones.

Both interfaces in each node are in same native VLAN:

              +--------+
[Switch]      | Node 1 |
 | |          |--------|
 | +--------- o eth0   | public iface
 +=========== o eth1   | private iface,  VLANs trunk, native VLAN same as eth0
              +--------+


Version-Release number of selected component (if applicable):
dracut-kernel.noarch                  004-303.el6          @anaconda-RedHatEnterpriseLinux-201301301459.x86_64/6.4
kernel.x86_64                         2.6.32-358.111.1.openstack.el6
kernel.x86_64                         2.6.32-358.114.1.openstack.el6
kernel.x86_64                         2.6.32-358.118.1.openstack.el6
kernel-debuginfo.x86_64               2.6.32-358.118.1.openstack.el6
kernel-debuginfo-common-x86_64.x86_64 2.6.32-358.118.1.openstack.el6
kernel-devel.x86_64                   2.6.32-358.114.1.openstack.el6
kernel-devel.x86_64                   2.6.32-358.118.1.openstack.el6
kernel-firmware.noarch                2.6.32-358.118.1.openstack.el6
kernel-headers.x86_64                 2.6.32-358.118.1.openstack.el6
openstack-quantum.noarch              2013.1.3-1.el6ost    @puddle              
openstack-quantum-openvswitch.noarch  2013.1.3-1.el6ost    @puddle              
openvswitch.x86_64                    1.9.0-2.el6ost       @puddle              
python-quantum.noarch                 2013.1.3-1.el6ost    @puddle              
python-quantumclient.noarch           2:2.2.1-2.el6ost     @puddle              


How reproducible:
1/1

Steps to Reproduce:
1. boot kernel ...114, measure the speed with iperf, from the controller to a instance, using floating IP
2. boot kernel ...118, measure again

Actual results:
The speed with 118 is like I remember from my school years, less than half megabit


Expected results:
way more than Mbit

Additional info:

Comment 1 Jaroslav Henner 2013-09-09 12:28:36 UTC

Created attachment 795598 [details]
kernel-114.pcapng.bz2

Comment 7 Jaroslav Henner 2013-09-10 06:55:38 UTC

It really helps to switch back to older kernel and I checked the logs and as IIRC problems started 5th of September, it correlates. Setting regression. I will check whether our other deployment suffers from this and whether I cannot get some more info.


I can see those in messages:
Aug 22 09:29:55 controller kernel: Linux version 2.6.32-358.114.1.openstack.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Jul 3 02:11:25 EDT 2013
Sep  4 17:09:27 controller kernel: Linux version 2.6.32-358.118.1.openstack.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Aug 14 13:18:08 EDT 2013
Sep  5 10:06:45 controller kernel: Linux version 2.6.32-358.118.1.openstack.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Aug 14 13:18:08 EDT 2013
Sep  6 12:29:42 controller kernel: Linux version 2.6.32-358.118.1.openstack.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Aug 14 13:18:08 EDT 2013

Comment 10 Thomas Graf 2013-09-11 11:48:29 UTC

Can you confirm that the OVS ports are all untagged? I was assuming so far that is the case but it's not entirely clear based on the information in this BZ.

Comment 11 Jaroslav Henner 2013-09-16 18:48:48 UTC

(In reply to Thomas Graf from comment #10)
> Can you confirm that the OVS ports are all untagged? I was assuming so far
> that is the case but it's not entirely clear based on the information in
> this BZ.

I don't much understand the question. In the attachment controller_status you can see that there are many ports tagged in in br-int:


+ovs-vsctl show
5a53e753-ecea-45e3-8d0e-8cb9db8710bb
    Bridge br-int
        Port br-int
            Interface br-int
                type: internal
        Port "tap1ae1918a-71"
            tag: 3
            Interface "tap1ae1918a-71"
        Port "tapabe9258e-8d"
            tag: 3
            Interface "tapabe9258e-8d"
        Port "tapa3d1159a-dc"
            tag: 5
            Interface "tapa3d1159a-dc"
        Port "tap959fdd9b-c2"
            tag: 1
            Interface "tap959fdd9b-c2"
...

Comment 12 Thomas Graf 2013-09-16 22:08:41 UTC

The question was basically if this is a DUP of BZ997632 which I think is the case. Would you agree with closing this as a DUP of BZ997632?

Comment 13 Jean-Tsung Hsiao 2013-09-17 00:50:02 UTC

(In reply to Thomas Graf from comment #12)
> The question was basically if this is a DUP of BZ997632 which I think is the
> case. Would you agree with closing this as a DUP of BZ997632?

Hi Thomas,

I think you're right --- this is a dup of Bug 997632.

Today, I just identified that VXLAN had the issue. So, I'm in the process of identifying at which build we started having this issue.

Stay tune.

Thanks!

Jean

Comment 14 Jean-Tsung Hsiao 2013-09-17 09:19:51 UTC

Was gre tunnel or vxlan part of the data path?

Thanks!

Jean

Comment 15 Jaroslav Henner 2013-09-18 08:54:51 UTC

(In reply to Jean-Tsung Hsiao from comment #14)
> Was gre tunnel or vxlan part of the data path?
> 
> Thanks!
> 
> Jean

No. As I drove above, in the bug description, my bare-metal machines have eth1 ifaces interconnected using trunk ports trunking several VLANs. No GRE nor VXLAN. The VLANs are terminated on the bare-metal.

Comment 19 Thomas Graf 2013-10-07 10:58:50 UTC

This is a duplicate of BZ997632 thus I'm marking it as such in order to keep all the information together.

*** This bug has been marked as a duplicate of bug 997632 ***

Comment 23 Radomir Vrbovsky 2013-10-28 08:49:02 UTC

fixed in kernel-2.6.32-358.123.3.openstack.el6

Comment 27 errata-xmlrpc 2013-11-14 17:41:57 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHSA-2013-1520.html

Comment 28 Jean-Tsung Hsiao 2014-06-01 02:13:56 UTC

Hi,

I would like to learn how to re-produce this issue.

Thanks in advance!

Jean

Note You need to log in before you can comment on or make changes to this bug.