Bug 1589058 - [OVN] The mac table size of ovn (br-int, br-*) is too small by default and eventually makes openvswitch explode
Summary: [OVN] The mac table size of ovn (br-int, br-*) is too small by default and ev...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openvswitch
Version: 13.0 (Queens)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: zstream
: ---
Assignee: Eelco Chaudron
QA Contact: Ofer Blaut
URL:
Whiteboard:
Depends On:
Blocks: 1592333
TreeView+ depends on / blocked
 
Reported: 2018-06-08 09:41 UTC by Miguel Angel Ajo
Modified: 2022-08-09 09:44 UTC (History)
30 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1558336
: 1592333 (view as bug list)
Environment:
Last Closed: 2018-06-18 13:23:54 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-9145 0 None None None 2022-08-09 09:44:05 UTC

Description Miguel Angel Ajo 2018-06-08 09:41:54 UTC
We need to increase the default OpenvSwitch mac table size (2048) to something that works better with busy environments.

For example,
ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000


Ideally, I would make ovn-controller configurable, for example:

sudo ovs-vsctl set open . external-ids:ovn-mac-table-size=50000

Would give ovn-controller an indication of what to configure into the switches.

The size of the table will really depend on the use case of the deployment, and how many macs is it handling on the external/internal networks.


Another option could be, if we can detect the overflows of such table, have some mechanism to dynamically increase the size. That'd be of course better.

+++ This bug was initially created as a clone of Bug #1558336 +++

Description of problem:

the CPU utilization of ovs-vswitchd is high without DPDK enabled

 PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
1512 root      10 -10 4352840 793864  12008 R  1101  0.3  15810:26 ovs-vswitchd

at the same time we were observing failures to send packets (ICMP) over VXLAN tunnel, we think this might be related to high CPU usage.

--- Additional comment from Jiri Benc on 2018-05-31 14:03:36 EDT ---

I managed to reproduce and analyze this.

First, the reproduction steps. It's actually surprisingly simple once you explore all the blind alleys.

Create an ovs bridge:

------
ovs-vsctl add-br ovs0
ip l s ovs0 up
------

Save this to a file named "reproducer.py":

------
#!/usr/bin/python
from scapy.all import *

data = [(str(RandMAC()), str(RandIP())) for i in range(int(sys.argv[1]))]

s = conf.L2socket(iface="ovs0")
while True:
    for mac, ip in data:
        p = Ether(src=mac, dst=mac)/IP(src=ip, dst=ip)
        s.send(p)
------

Run the reproducer:

./reproducer.py 5000

--- Additional comment from Jiri Benc on 2018-05-31 14:26:26 EDT ---

The problem is how flow revalidation works in ovs. There are several 'revalidator' threads launched. They should normally sleep (modulo waking every 0.5 second just to do nothing) and they wake if anything of interest happens (udpif_revalidator => poll_block). On every wake up, each revalidator thread checks whether flow revalidation is needed and if it is, it does the revalidation.

The revalidation is very costly with high number of flows. I also suspect there's a lot of contention between the revalidator threads.

The flow revalidation is triggered by many things. What is of interest for us is that any eviction of a MAC learning table entry triggers revalidation.

The reproducer script repeatedly sends the same 5000 packets, all of them with a different MAC address. This causes constant overflows of the MAC learning table and constant revalidation. The revalidator threads are being immediately woken up and are busy looping the revalidation.

Which is exactly the pattern from the customers' data: there are 16000+ flows and the packet capture shows that the packets are repeating every second.

A quick fix is to increase the MAC learning table size:

ovs-vsctl set bridge <bridge> other-config:mac-table-size=50000

This should lower the CPU usage down substantially; allow a few seconds for things to settle down.

Comment 2 Jiri Benc 2018-06-08 09:59:34 UTC
(In reply to Miguel Angel Ajo from comment #0)
> Another option could be, if we can detect the overflows of such table, have
> some mechanism to dynamically increase the size. That'd be of course better.

That's not really needed: the hash table that contains the MAC table entries is not allocated for the full specified size. Instead, it is dynamically enlarged (doubled) each time it gets too full, and there's code evicting entries over mac-table-size. The hash table is never shrunk, though (not even when mac-table-size is lowered).

Comment 3 Eelco Chaudron 2018-06-16 10:00:29 UTC
In theory, we could increase the default MAC size, but we need to understand the additional impact on this and reach upstream agreement on changing a default.

Alternatively, a couple of BZs exists, 1589031 is one of them, they try to solve this in OSP. Which is probably a better solution as this is the OVS use case that needs a bigger table. And the configuration option to change the size exists. It looks odd to change the default MAC table size based on a specific use case.

Comment 4 Miguel Angel Ajo 2018-06-16 18:35:52 UTC
I opened this BZ in relation to OVN, not just for OVS by itself. I'm not sure if this setting has an impact on OVN itself since it does not use NORMAL rules.

Comment 5 Jakub Sitnicki 2018-06-18 13:12:02 UTC
(In reply to Eelco Chaudron from comment #3)
> In theory, we could increase the default MAC size, but we need to understand
> the additional impact on this and reach upstream agreement on changing a
> default.

We had a short discussion with Miguel, Numan and Russell. Let me try to sum it up.

On br-int we are not using NORMAL rules, so technically OVN is not affected by this limit. The rest of non-OVN managed bridges might be affected by it, i.e. the provider bridges.

Depending on what is the impact of increasing the default size on resource consumption (memory, CPU) it would be good to at least consider if 2048 is still a sane default for deployments that use OVS nowadays.

That is to say, if it is cheap to have a bigger MAC learning table, then it would be one less knob that has be adjusted by LPs.

Comment 6 Eelco Chaudron 2018-06-18 13:23:54 UTC
(In reply to Jakub Sitnicki from comment #5)
> (In reply to Eelco Chaudron from comment #3)
> > In theory, we could increase the default MAC size, but we need to understand
> > the additional impact on this and reach upstream agreement on changing a
> > default.
> 
> We had a short discussion with Miguel, Numan and Russell. Let me try to sum
> it up.
> 
> On br-int we are not using NORMAL rules, so technically OVN is not affected
> by this limit. The rest of non-OVN managed bridges might be affected by it,
> i.e. the provider bridges.
> 
> Depending on what is the impact of increasing the default size on resource
> consumption (memory, CPU) it would be good to at least consider if 2048 is
> still a sane default for deployments that use OVS nowadays.
> 
> That is to say, if it is cheap to have a bigger MAC learning table, then it
> would be one less knob that has be adjusted by LPs.

Ok, if OVN does not care about this, I'll close this BZ, and the increase of the MAC table is tracked under BZ 1558336.


Note You need to log in before you can comment on or make changes to this bug.