In OpenStack, we have been doing some tricks in the past to workaround the limitation of MAC_Binding entries not expiring. Some of those tricks involve not monitoring the MAC_Binding table at all to avoid OOM killers [0] or delete the entries upon association/disassociation of a Floating IP [1]. Ideally, old (or better, unused) entries should be deleted helping reduce the size of the database but also avoiding issues when reusing IP addresses. Link to the original upstream discussion: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048936.html [0] https://opendev.org/openstack/neutron/commit/f6c35527698119ee6f73a6a3613c9beebb563840 [1] https://opendev.org/openstack/networking-ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7?style=unified&whitespace=ignore-change
OpenShift also ran into this because originally it didn't use exclude-lb-vips-from-garp=true, leading to [Service VIP * nodes] MAC bindings in SB. So I think this would be useful for both OCP and OSP.
After discussion we have came up with couple of possible solutions: 1) Add column to MAC_Binding "idle_age" that would be updated by ovn-controllers based on "idle_age" of particular physical flow statistic. That has probably one major scale drawback and that's there would be a lot of updates to that "idle_age" column on large envs. 2) Add action that would clear/bump timer local to ovn-controller that could be installed to the table 66/67 with every MAC binding flow. The drawback here could be a lot of calls to ovn-controller action when there is a lot of traffic going on on large envs. 3) Add column for the "owner" of the MAC_Binding row, the owner (ovn-controller that created the row) would be responsible for checking "idle_age" timer. The controller could check "idle_age" only locally without sending any updates to SB database. The main issue is that, if the datapath is distributed over multiple controllers we could, effectively delete MAC binding from other controllers even when they are still using it. The controller would be able to recreate it, but it could cause some delays. From all the suggested solutions the 3) looks most promising however it still needs to be tested to exclude any possible performance regressions.
First iteration posted: https://patchwork.ozlabs.org/project/ovn/list/?series=304732
Created attachment 1893876 [details] results.pdf Results of measurement with Xena system