Description of problem: We have experienced a regression of TCP speed when downloading some data to our instances. After some investigation I found that switching back to kernel 114 from 118 on the controller helped. I am not sure whether it may be related, but all of our HW machines have nics: Broadcom Corporation NetXtreme II BCM5716 Gigabit Ethernet (rev 20). the controller runs as a libvirt VM on same HW as is used for compute nodes, but as VM, virtio interfaces, bridged with the real HW ones. Both interfaces in each node are in same native VLAN: +--------+ [Switch] | Node 1 | | | |--------| | +--------- o eth0 | public iface +=========== o eth1 | private iface, VLANs trunk, native VLAN same as eth0 +--------+ Version-Release number of selected component (if applicable): dracut-kernel.noarch 004-303.el6 @anaconda-RedHatEnterpriseLinux-201301301459.x86_64/6.4 kernel.x86_64 2.6.32-358.111.1.openstack.el6 kernel.x86_64 2.6.32-358.114.1.openstack.el6 kernel.x86_64 2.6.32-358.118.1.openstack.el6 kernel-debuginfo.x86_64 2.6.32-358.118.1.openstack.el6 kernel-debuginfo-common-x86_64.x86_64 2.6.32-358.118.1.openstack.el6 kernel-devel.x86_64 2.6.32-358.114.1.openstack.el6 kernel-devel.x86_64 2.6.32-358.118.1.openstack.el6 kernel-firmware.noarch 2.6.32-358.118.1.openstack.el6 kernel-headers.x86_64 2.6.32-358.118.1.openstack.el6 openstack-quantum.noarch 2013.1.3-1.el6ost @puddle openstack-quantum-openvswitch.noarch 2013.1.3-1.el6ost @puddle openvswitch.x86_64 1.9.0-2.el6ost @puddle python-quantum.noarch 2013.1.3-1.el6ost @puddle python-quantumclient.noarch 2:2.2.1-2.el6ost @puddle How reproducible: 1/1 Steps to Reproduce: 1. boot kernel ...114, measure the speed with iperf, from the controller to a instance, using floating IP 2. boot kernel ...118, measure again Actual results: The speed with 118 is like I remember from my school years, less than half megabit Expected results: way more than Mbit Additional info:
Created attachment 795598 [details] kernel-114.pcapng.bz2
It really helps to switch back to older kernel and I checked the logs and as IIRC problems started 5th of September, it correlates. Setting regression. I will check whether our other deployment suffers from this and whether I cannot get some more info. I can see those in messages: Aug 22 09:29:55 controller kernel: Linux version 2.6.32-358.114.1.openstack.el6.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Jul 3 02:11:25 EDT 2013 Sep 4 17:09:27 controller kernel: Linux version 2.6.32-358.118.1.openstack.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Aug 14 13:18:08 EDT 2013 Sep 5 10:06:45 controller kernel: Linux version 2.6.32-358.118.1.openstack.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Aug 14 13:18:08 EDT 2013 Sep 6 12:29:42 controller kernel: Linux version 2.6.32-358.118.1.openstack.el6.x86_64 (mockbuild.bos.redhat.com) (gcc version 4.4.7 20120313 (Red Hat 4.4.7-3) (GCC) ) #1 SMP Wed Aug 14 13:18:08 EDT 2013
Can you confirm that the OVS ports are all untagged? I was assuming so far that is the case but it's not entirely clear based on the information in this BZ.
(In reply to Thomas Graf from comment #10) > Can you confirm that the OVS ports are all untagged? I was assuming so far > that is the case but it's not entirely clear based on the information in > this BZ. I don't much understand the question. In the attachment controller_status you can see that there are many ports tagged in in br-int: +ovs-vsctl show 5a53e753-ecea-45e3-8d0e-8cb9db8710bb Bridge br-int Port br-int Interface br-int type: internal Port "tap1ae1918a-71" tag: 3 Interface "tap1ae1918a-71" Port "tapabe9258e-8d" tag: 3 Interface "tapabe9258e-8d" Port "tapa3d1159a-dc" tag: 5 Interface "tapa3d1159a-dc" Port "tap959fdd9b-c2" tag: 1 Interface "tap959fdd9b-c2" ...
The question was basically if this is a DUP of BZ997632 which I think is the case. Would you agree with closing this as a DUP of BZ997632?
(In reply to Thomas Graf from comment #12) > The question was basically if this is a DUP of BZ997632 which I think is the > case. Would you agree with closing this as a DUP of BZ997632? Hi Thomas, I think you're right --- this is a dup of Bug 997632. Today, I just identified that VXLAN had the issue. So, I'm in the process of identifying at which build we started having this issue. Stay tune. Thanks! Jean
Was gre tunnel or vxlan part of the data path? Thanks! Jean
(In reply to Jean-Tsung Hsiao from comment #14) > Was gre tunnel or vxlan part of the data path? > > Thanks! > > Jean No. As I drove above, in the bug description, my bare-metal machines have eth1 ifaces interconnected using trunk ports trunking several VLANs. No GRE nor VXLAN. The VLANs are terminated on the bare-metal.
This is a duplicate of BZ997632 thus I'm marking it as such in order to keep all the information together. *** This bug has been marked as a duplicate of bug 997632 ***
fixed in kernel-2.6.32-358.123.3.openstack.el6
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2013-1520.html
Hi, I would like to learn how to re-produce this issue. Thanks in advance! Jean