Bug 1385787

Summary: Slow traffic between instances 1Gbps
Product: Red Hat OpenStack Reporter: Robin Cernin <rcernin>
Component: openstack-neutronAssignee: Ihar Hrachyshka <ihrachys>
Status: CLOSED NOTABUG QA Contact: Toni Freger <tfreger>
Severity: high Docs Contact:
Priority: high    
Version: 7.0 (Kilo)CC: akaris, amuller, ccharron, chrisw, ihrachys, nyechiel, pablo.iranzo, rcernin, skinjo, srevivo
Target Milestone: async   
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-22 14:37:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Robin Cernin 2016-10-17 16:56:50 UTC
Description of problem:

We have slow(1Gbps) traffic between instances on 1 compute node within same tenant network. In production environment, in test environment we can reach 6-8Gbps see below.

It affects all flavors. Problem really does not seem to be QoS of flavors.

Version-Release number of selected component (if applicable):

openstack-neutron-openvswitch-2015.1.2-11.el7ost.noarch
openvswitch-2.4.0-2.el7_2.x86_64
python-openvswitch-2.4.0-2.el7_2.noarch

Kernel:

Linux d4-ucos-nova4.host.intranet 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux

 // * Production environment: * Hardware: Cisco USC (all compute nodes) + 10Gbps network * Platform software: RHEL7 and OSP7 - Instances: CentOS7


In the test environment:

 // * Testing environment: * Hardware: Commodity hardware + 1Gbps network * Platform software: Ubuntu 16.04 and OpenStack Mitaka - Instances: CentOS7 

kernel-4.4.0 and ovs-2.5.0

How reproducible:

test-instance-1:
# iperf -s
------------------------------------------------------------
Server listening on TCP port 5001
TCP window size: 85.3 KByte (default)
------------------------------------------------------------
[  4] local 10.0.0.18 port 5001 connected with 10.0.0.19 port 33732
[ ID] Interval       Transfer     Bandwidth
[  4]  0.0-10.0 sec  1.40 GBytes  1.20 Gbits/sec
[  5] local 10.0.0.18 port 5001 connected with 10.0.0.19 port 33733
[  5]  0.0-10.0 sec  1.35 GBytes  1.16 Gbits/sec

test-instance-2:
# iperf -c 10.0.0.18
------------------------------------------------------------
Client connecting to 10.0.0.18, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.19 port 33732 connected with 10.0.0.18 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.40 GBytes  1.20 Gbits/sec
[root@teste1 administrator]# iperf -c 10.0.0.18
------------------------------------------------------------
Client connecting to 10.0.0.18, TCP port 5001
TCP window size: 45.0 KByte (default)
------------------------------------------------------------
[  3] local 10.0.0.19 port 33733 connected with 10.0.0.18 port 5001
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.35 GBytes  1.16 Gbits/sec


Steps to Reproduce:
1.
2.
3.

Actual results:

We get the 

Expected results:


Additional info:

Comment 8 Assaf Muller 2016-12-19 13:22:52 UTC
Ihar, can you please look at the SOS report and see if there's any MTU related configuration issues that would explain low throughput between VMs on different compute nodes?

Comment 9 Andreas Karis 2016-12-19 13:31:14 UTC
Hi,

This issue has come up on many occasions with many customers. 

Yes, increasing the MTU is a workaround that works by lowering the number of PPS across the VXLAN tunnels. However, increasing the MTU only works if we stay within the cluster. Once we are out in the wild, we don't control the MTU any more. The real problem here is that sending packets through the VXLAN tunnels of OVS creates high software interrupts and this only on one CPU. This is due to the fact that hardware offloading technologies such as GRO won't work for traffic within the tunnel Sending outside of the tunnel is fine, because the hardware can help the kernel to process a higher number of packets.

I find it very difficult to explain to customers that the increase of MTU should be the solution when in reality it's only a way to lower the PPS, and thus is only a workaround, for an underlying problem. I think the recommendation to customers should be to buy certified / tested NICs with VXLAN offloading.

Comment 10 Andreas Karis 2016-12-19 13:35:01 UTC
https://access.redhat.com/solutions/2778731

Comment 11 Assaf Muller 2016-12-20 21:51:05 UTC
(In reply to Andreas Karis from comment #9)
> Hi,
> 
> This issue has come up on many occasions with many customers. 
> 
> Yes, increasing the MTU is a workaround that works by lowering the number of
> PPS across the VXLAN tunnels. However, increasing the MTU only works if we
> stay within the cluster. Once we are out in the wild, we don't control the
> MTU any more. The real problem here is that sending packets through the
> VXLAN tunnels of OVS creates high software interrupts and this only on one
> CPU. This is due to the fact that hardware offloading technologies such as
> GRO won't work for traffic within the tunnel Sending outside of the tunnel
> is fine, because the hardware can help the kernel to process a higher number
> of packets.
> 
> I find it very difficult to explain to customers that the increase of MTU
> should be the solution when in reality it's only a way to lower the PPS, and
> thus is only a workaround, for an underlying problem. I think the
> recommendation to customers should be to buy certified / tested NICs with
> VXLAN offloading.

I don't consider jumbo frames and VVXLAN offloading a workaround. They're required to obtain line rate speeds and is the recommendation of our own performance team.

Comment 12 Andreas Karis 2016-12-20 21:54:12 UTC
I didn't say that VXLAN offloading was a workaround. But using  jumbo frames effectively *is* a workaround.

The problem is that once the packets go through the VXLAN tunnel, they are switched in software. We cannot switch the number of packets that we'd like in software, which is why the throughput goes significantly down. So we enable jumbo frames to lower the total number of PPS. Which in my opinion is a workaround. We can't achieve our goal = switching high number of packets, so we lower the number of packets instead.

Comment 13 Andreas Karis 2016-12-20 21:57:57 UTC
Regardless of that though, I agree with you that both jumboframes and VXLAN offloading should be recommended to customers. I'd just like to see some more recommendations / documentation for VXLAN offloading. Some customers are reluctant to implement the jumbo frame step, and I'd like to give them an alternative.

Also note that I think that we do not have a "recommendation of our own performance team" with performance measurements and recommended NIC hardware that we can just forward to our customers. Or do we?

Comment 14 Andreas Karis 2016-12-20 22:03:43 UTC
Let's take a different example: what if the customer's goal was not total throughput, but PPS with packet sizes < 1500 Bytes? Increasing the MTU doesn't help at all in that case.

Comment 15 Ihar Hrachyshka 2016-12-21 14:47:09 UTC
I don't think we have any MTU related fixes and options that helped for Liberty+ setups. In Kilo, we only have network_device_mtu (on both neutron and nova sides), and it does not distinguish between network types.

Sadly, I don't have access to sos reports, the directory does not exist. Could you please re-upload them? Depending on tenant network types used, we may try to make network_device_mtu option work.

How was the cluster deployed? Was OSP Director used?

Comment 16 Ihar Hrachyshka 2017-05-22 14:37:01 UTC
If the documentation for VXLAN offloading is not present or in bad shape, please report a documentation bug. I think it's clear this and related customer cases are issues with hardware not picked correctly, and/or Jumbos not configured. There is not much engineers can do.

If you still experience performance issues that you think may be related to how Neutron configures bridges and tap devices, please reopen with explanation.