Bug 1359951

Summary: Poor L3 network performance
Product: Red Hat OpenStack Reporter: Jeremy Eder <jeder>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED DUPLICATE QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 8.0 (Liberty)CC: amuller, chrisw, ihrachys, jtaleric, nyechiel, srevivo
Target Milestone: ---Keywords: ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-12-21 14:14:04 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jeremy Eder 2016-07-25 23:41:16 UTC
We have an OSP8 install that's showing some really poor network performance when trying to download things from the internet.

This environment is in a cluster loaned to us by the CNCF (www.cncf.io), and we're trying to run to scale-out tests of OpenShift-on-OpenStack (we haven't gotten to the OpenShift part yet).

The issue is a VM running on OpenStack, trying to download for example a CentOS cloud image goes around 30kbps.

Tried several other download locations, all running around the same speed.

The same download running on one of the bare metal controllers runs at decent 
speed (70+mbit...).

Transfers between VMs run at decent speed as well.  It's only when we try to pull from an external source (like dockerhub or similar), that we see the issue.

openstack-neutron-7.1.1-3.el7ost
ovs-ml2 driver
VXLAN tunnels
bonds (LACP) w/ VLANs for the overcloud deployment

Comment 2 Joe Talerico 2016-07-26 10:06:36 UTC
Within the active qrouter namespace:
[root@overcloud-controller-1 heat-admin]# ip netns exec qrouter-f125cf47-0e27-448a-9943-02df272fbae9 ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
85: ha-4e072f10-80: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN 
    link/ether fa:16:3e:7f:00:43 brd ff:ff:ff:ff:ff:ff
    inet 169.254.192.2/18 brd 169.254.255.255 scope global ha-4e072f10-80
       valid_lft forever preferred_lft forever
    inet 169.254.0.1/24 scope global ha-4e072f10-80
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe7f:43/64 scope link 
       valid_lft forever preferred_lft forever
86: qr-095135b6-ba: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN 
    link/ether fa:16:3e:39:15:09 brd ff:ff:ff:ff:ff:ff
    inet 192.0.0.1/24 scope global qr-095135b6-ba
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe39:1509/64 scope link 
       valid_lft forever preferred_lft forever
88: qg-d08ed9a2-24: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN 
    link/ether fa:16:3e:5f:f8:81 brd ff:ff:ff:ff:ff:ff
    inet 10.2.4.23/21 scope global qg-d08ed9a2-24
       valid_lft forever preferred_lft forever
    inet 10.2.4.24/32 scope global qg-d08ed9a2-24
       valid_lft forever preferred_lft forever
    inet6 fe80::f816:3eff:fe5f:f881/64 scope link 
       valid_lft forever preferred_lft forever


The guest veths:
48: qvo9bfe2904-d9@qvb9bfe2904-d9: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master ovs-system state UP qlen 1000
    link/ether 82:87:08:78:6a:b6 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::8087:8ff:fe78:6ab6/64 scope link 
       valid_lft forever preferred_lft forever
49: qvb9bfe2904-d9@qvo9bfe2904-d9: <BROADCAST,MULTICAST,PROMISC,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9bfe2904-d9 state UP qlen 1000
    link/ether 12:9e:b7:5c:54:fe brd ff:ff:ff:ff:ff:ff
    inet6 fe80::109e:b7ff:fe5c:54fe/64 scope link 
       valid_lft forever preferred_lft forever
50: tap9bfe2904-d9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master qbr9bfe2904-d9 state UNKNOWN qlen 500
    link/ether fe:16:3e:4c:5d:c7 brd ff:ff:ff:ff:ff:ff
    inet6 fe80::fc16:3eff:fe4c:5dc7/64 scope link 
       valid_lft forever preferred_lft forever

The guest MTU: 
root@rook: ~ # ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host 
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc pfifo_fast state UP qlen 1000
    link/ether fa:16:3e:4c:5d:c7 brd ff:ff:ff:ff:ff:ff
    inet 192.0.0.5/24 brd 192.0.0.255 scope global dynamic eth0
       valid_lft 70762sec preferred_lft 70762sec
    inet6 fe80::f816:3eff:fe4c:5dc7/64 scope link 
       valid_lft forever preferred_lft forever
3: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN 
    link/ether 02:42:b3:aa:6a:2d brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 scope global docker0
       valid_lft forever preferred_lft forever

Neutron network:
stack@ospd: ~ $ neutron net-show rook-test
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | True                                 |
| id                        | 484c33ea-e3a6-412d-81ca-552578b06069 |
| mtu                       | 1450                                 |
| name                      | rook-test                            |
| port_security_enabled     | True                                 |
| provider:network_type     | vxlan                                |
| provider:physical_network |                                      |
| provider:segmentation_id  | 74                                   |
| qos_policy_id             |                                      |
| router:external           | False                                |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   | 72d58739-c4b1-4ae7-b656-69f28eabb2bb |
| tenant_id                 | b5539cf98ca34a3c8071dd3da384351c     |
+---------------------------+--------------------------------------+

External network:
stack@ospd: ~ $ neutron net-show public
+---------------------------+--------------------------------------+
| Field                     | Value                                |
+---------------------------+--------------------------------------+
| admin_state_up            | True                                 |
| id                        | 158ff937-8559-4171-8bfa-6a88250d2a58 |
| mtu                       | 1500                                 |
| name                      | public                               |
| port_security_enabled     | True                                 |
| provider:network_type     | vlan                                 |
| provider:physical_network | datacentre                           |
| provider:segmentation_id  | 7                                    |
| qos_policy_id             |                                      |
| router:external           | True                                 |
| shared                    | False                                |
| status                    | ACTIVE                               |
| subnets                   | 53f70ea1-17e9-4103-8f9b-cf542d336993 |
| tenant_id                 | b5539cf98ca34a3c8071dd3da384351c     |
+---------------------------+--------------------------------------+

Comment 3 Ihar Hrachyshka 2016-12-21 14:14:04 UTC
So as far as I see, neutron-server calculated MTU = 1500 for the external network, but then it still got qg (external gateway) interface of the router to use 1450 instead of 1500. That may be the problem, because other counterparts on the external network assume that the router will be able to process 1500-sized frames, and it probably just drops them.

I think it may be a duplicate of bug 1348116. Please validate if a later (7.2.0+) build fixes the issue for you.

I am closing the bug as a duplicate. Feel free to reopen if you still hit the issue.

Also, if you do, please attach neutron logs and all config files (/etc/neutron/...)

*** This bug has been marked as a duplicate of bug 1348116 ***