Bug 1591206

Summary: The mac table size of neutron bridges (br-tun, br-int, br-*) is too small by default and eventually makes openvswitch explode
Product: Red Hat OpenStack Reporter: Slawek Kaplonski <skaplons>
Component: openstack-neutronAssignee: Slawek Kaplonski <skaplons>
Status: CLOSED ERRATA QA Contact: Roee Agiman <ragiman>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 13.0 (Queens)CC: amuller, apevec, atelang, bcafarel, bhaley, bjarolim, chrisw, cshastri, dvd, echaudro, erkki.peura, feimingyun, gkumar, jbenc, jlibosva, jraju, jschluet, kiyyappa, lmarsh, majopela, mcroce, mori, nyechiel, oblaut, ovs-team, pabeni, pablo.iranzo, pmorey, ragiman, rcernin, rhos-maint, rkhan, sbandyop, skaplons, srevivo, tfreger, vkommadi
Target Milestone: z1Keywords: Triaged, ZStream
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-neutron-12.0.2-0.20180421011364.0ec54fd.el7ost Doc Type: If docs needed, set a value
Doc Text:
A new configuration option called bridge_mac_table_size has been added for the neutron OVS agent. This value is set as the "other_config:mac-table-size" option on each bridge managed by the openvswitch-neutron-agent. The value controls the maximum number of MAC addresses that can be learned on a bridge. The default value for this new option is 50,000, which should be enough for most systems. Values outside a reasonable range (10 to 1,000,000) will be forced by OVS.
Story Points: ---
Clone Of: 1589031 Environment:
Last Closed: 2018-07-19 13:53:43 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1589031    
Bug Blocks: 1591204    

Description Slawek Kaplonski 2018-06-14 09:33:41 UTC
+++ This bug was initially created as a clone of Bug #1589031 +++

We need to increase the default OpenvSwitch mac table size (2048) to something that works better with busy environments.

ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000

+++ This bug was initially created as a clone of Bug #1558336 +++

Description of problem:

the CPU utilization of ovs-vswitchd is high without DPDK enabled

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
1512 root      10 -10 4352840 793864  12008 R  1101  0.3  15810:26 ovs-vswitchd

at the same time we were observing failures to send packets (ICMP) over VXLAN tunnel, we think this might be related to high CPU usage.

--- Additional comment from Jiri Benc on 2018-05-31 14:03:36 EDT ---

I managed to reproduce and analyze this.

First, the reproduction steps. It's actually surprisingly simple once you explore all the blind alleys.

Create an ovs bridge:

------
ovs-vsctl add-br ovs0
ip l s ovs0 up
------

Save this to a file named "reproducer.py":

------
#!/usr/bin/python
from scapy.all import *

data = [(str(RandMAC()), str(RandIP())) for i in range(int(sys.argv[1]))]

s = conf.L2socket(iface="ovs0")
while True:
    for mac, ip in data:
        p = Ether(src=mac, dst=mac)/IP(src=ip, dst=ip)
        s.send(p)
------

Run the reproducer:

./reproducer.py 5000

--- Additional comment from Jiri Benc on 2018-05-31 14:26:26 EDT ---

The problem is how flow revalidation works in ovs. There are several 'revalidator' threads launched. They should normally sleep (modulo waking every 0.5 second just to do nothing) and they wake if anything of interest happens (udpif_revalidator => poll_block). On every wake up, each revalidator thread checks whether flow revalidation is needed and if it is, it does the revalidation.

The revalidation is very costly with high number of flows. I also suspect there's a lot of contention between the revalidator threads.

The flow revalidation is triggered by many things. What is of interest for us is that any eviction of a MAC learning table entry triggers revalidation.

The reproducer script repeatedly sends the same 5000 packets, all of them with a different MAC address. This causes constant overflows of the MAC learning table and constant revalidation. The revalidator threads are being immediately woken up and are busy looping the revalidation.

Which is exactly the pattern from the customers' data: there are 16000+ flows and the packet capture shows that the packets are repeating every second.

A quick fix is to increase the MAC learning table size:

ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000

This should lower the CPU usage down substantially; allow a few seconds for things to settle down.

Comment 18 Bernard Cafarelli 2018-07-17 13:08:56 UTC
Checked with Roee

[stack@undercloud-0 ~]$ cat core_puddle_version 
2018-07-06.1

[heat-admin@compute-1 ~]$ sudo ovs-vsctl list Bridge br-int|grep mac
other_config        : {mac-table-size="50000"}
(same on other nodes and bridges)

Marking VERIFIED

Comment 20 errata-xmlrpc 2018-07-19 13:53:43 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:2215