We need to increase the default OpenvSwitch mac table size (2048) to something that works better with busy environments. For example, ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000 Ideally, I would make ovn-controller configurable, for example: sudo ovs-vsctl set open . external-ids:ovn-mac-table-size=50000 Would give ovn-controller an indication of what to configure into the switches. The size of the table will really depend on the use case of the deployment, and how many macs is it handling on the external/internal networks. Another option could be, if we can detect the overflows of such table, have some mechanism to dynamically increase the size. That'd be of course better. +++ This bug was initially created as a clone of Bug #1558336 +++ Description of problem: the CPU utilization of ovs-vswitchd is high without DPDK enabled PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 1512 root 10 -10 4352840 793864 12008 R 1101 0.3 15810:26 ovs-vswitchd at the same time we were observing failures to send packets (ICMP) over VXLAN tunnel, we think this might be related to high CPU usage. --- Additional comment from Jiri Benc on 2018-05-31 14:03:36 EDT --- I managed to reproduce and analyze this. First, the reproduction steps. It's actually surprisingly simple once you explore all the blind alleys. Create an ovs bridge: ------ ovs-vsctl add-br ovs0 ip l s ovs0 up ------ Save this to a file named "reproducer.py": ------ #!/usr/bin/python from scapy.all import * data = [(str(RandMAC()), str(RandIP())) for i in range(int(sys.argv[1]))] s = conf.L2socket(iface="ovs0") while True: for mac, ip in data: p = Ether(src=mac, dst=mac)/IP(src=ip, dst=ip) s.send(p) ------ Run the reproducer: ./reproducer.py 5000 --- Additional comment from Jiri Benc on 2018-05-31 14:26:26 EDT --- The problem is how flow revalidation works in ovs. There are several 'revalidator' threads launched. They should normally sleep (modulo waking every 0.5 second just to do nothing) and they wake if anything of interest happens (udpif_revalidator => poll_block). On every wake up, each revalidator thread checks whether flow revalidation is needed and if it is, it does the revalidation. The revalidation is very costly with high number of flows. I also suspect there's a lot of contention between the revalidator threads. The flow revalidation is triggered by many things. What is of interest for us is that any eviction of a MAC learning table entry triggers revalidation. The reproducer script repeatedly sends the same 5000 packets, all of them with a different MAC address. This causes constant overflows of the MAC learning table and constant revalidation. The revalidator threads are being immediately woken up and are busy looping the revalidation. Which is exactly the pattern from the customers' data: there are 16000+ flows and the packet capture shows that the packets are repeating every second. A quick fix is to increase the MAC learning table size: ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000 This should lower the CPU usage down substantially; allow a few seconds for things to settle down.
(In reply to Miguel Angel Ajo from comment #0) > Another option could be, if we can detect the overflows of such table, have > some mechanism to dynamically increase the size. That'd be of course better. That's not really needed: the hash table that contains the MAC table entries is not allocated for the full specified size. Instead, it is dynamically enlarged (doubled) each time it gets too full, and there's code evicting entries over mac-table-size. The hash table is never shrunk, though (not even when mac-table-size is lowered).
In theory, we could increase the default MAC size, but we need to understand the additional impact on this and reach upstream agreement on changing a default. Alternatively, a couple of BZs exists, 1589031 is one of them, they try to solve this in OSP. Which is probably a better solution as this is the OVS use case that needs a bigger table. And the configuration option to change the size exists. It looks odd to change the default MAC table size based on a specific use case.
I opened this BZ in relation to OVN, not just for OVS by itself. I'm not sure if this setting has an impact on OVN itself since it does not use NORMAL rules.
(In reply to Eelco Chaudron from comment #3) > In theory, we could increase the default MAC size, but we need to understand > the additional impact on this and reach upstream agreement on changing a > default. We had a short discussion with Miguel, Numan and Russell. Let me try to sum it up. On br-int we are not using NORMAL rules, so technically OVN is not affected by this limit. The rest of non-OVN managed bridges might be affected by it, i.e. the provider bridges. Depending on what is the impact of increasing the default size on resource consumption (memory, CPU) it would be good to at least consider if 2048 is still a sane default for deployments that use OVS nowadays. That is to say, if it is cheap to have a bigger MAC learning table, then it would be one less knob that has be adjusted by LPs.
(In reply to Jakub Sitnicki from comment #5) > (In reply to Eelco Chaudron from comment #3) > > In theory, we could increase the default MAC size, but we need to understand > > the additional impact on this and reach upstream agreement on changing a > > default. > > We had a short discussion with Miguel, Numan and Russell. Let me try to sum > it up. > > On br-int we are not using NORMAL rules, so technically OVN is not affected > by this limit. The rest of non-OVN managed bridges might be affected by it, > i.e. the provider bridges. > > Depending on what is the impact of increasing the default size on resource > consumption (memory, CPU) it would be good to at least consider if 2048 is > still a sane default for deployments that use OVS nowadays. > > That is to say, if it is cheap to have a bigger MAC learning table, then it > would be one less knob that has be adjusted by LPs. Ok, if OVN does not care about this, I'll close this BZ, and the increase of the MAC table is tracked under BZ 1558336.