We need to increase the default OpenvSwitch mac table size (2048) to something that works better with busy environments. ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000 +++ This bug was initially created as a clone of Bug #1558336 +++ Description of problem: the CPU utilization of ovs-vswitchd is high without DPDK enabled PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1512 root 10 -10 4352840 793864 12008 R 1101 0.3 15810:26 ovs-vswitchd at the same time we were observing failures to send packets (ICMP) over VXLAN tunnel, we think this might be related to high CPU usage. --- Additional comment from Jiri Benc on 2018-05-31 14:03:36 EDT --- I managed to reproduce and analyze this. First, the reproduction steps. It's actually surprisingly simple once you explore all the blind alleys. Create an ovs bridge: ------ ovs-vsctl add-br ovs0 ip l s ovs0 up ------ Save this to a file named "reproducer.py": ------ #!/usr/bin/python from scapy.all import * data = [(str(RandMAC()), str(RandIP())) for i in range(int(sys.argv[1]))] s = conf.L2socket(iface="ovs0") while True: for mac, ip in data: p = Ether(src=mac, dst=mac)/IP(src=ip, dst=ip) s.send(p) ------ Run the reproducer: ./reproducer.py 5000 --- Additional comment from Jiri Benc on 2018-05-31 14:26:26 EDT --- The problem is how flow revalidation works in ovs. There are several 'revalidator' threads launched. They should normally sleep (modulo waking every 0.5 second just to do nothing) and they wake if anything of interest happens (udpif_revalidator => poll_block). On every wake up, each revalidator thread checks whether flow revalidation is needed and if it is, it does the revalidation. The revalidation is very costly with high number of flows. I also suspect there's a lot of contention between the revalidator threads. The flow revalidation is triggered by many things. What is of interest for us is that any eviction of a MAC learning table entry triggers revalidation. The reproducer script repeatedly sends the same 5000 packets, all of them with a different MAC address. This causes constant overflows of the MAC learning table and constant revalidation. The revalidator threads are being immediately woken up and are busy looping the revalidation. Which is exactly the pattern from the customers' data: there are 16000+ flows and the packet capture shows that the packets are repeating every second. A quick fix is to increase the MAC learning table size: ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000 This should lower the CPU usage down substantially; allow a few seconds for things to settle down.
copied doc text from 1591206.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2715