The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 2084668 - [ovn] Aging mechanism for MAC_Binding entries
Summary: [ovn] Aging mechanism for MAC_Binding entries
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 22.L
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ---
: ---
Assignee: Ales Musil
QA Contact: Ehsan Elahi
URL:
Whiteboard:
Depends On:
Blocks: 2078986 2209893
TreeView+ depends on / blocked
 
Reported: 2022-05-12 16:00 UTC by Daniel Alvarez Sanchez
Modified: 2023-05-25 07:30 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-13 07:18:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
results.pdf (167.83 KB, application/pdf)
2022-07-01 10:05 UTC, Ales Musil
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2006 0 None None None 2022-06-03 13:41:20 UTC

Description Daniel Alvarez Sanchez 2022-05-12 16:00:55 UTC
In OpenStack, we have been doing some tricks in the past to workaround the limitation of MAC_Binding entries not expiring.

Some of those tricks involve not monitoring the MAC_Binding table at all to avoid OOM killers [0] or delete the entries upon association/disassociation of a Floating IP [1].

Ideally, old (or better, unused) entries should be deleted helping reduce the size of the database but also avoiding issues when reusing IP addresses.

Link to the original upstream discussion: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048936.html



[0] https://opendev.org/openstack/neutron/commit/f6c35527698119ee6f73a6a3613c9beebb563840
[1] https://opendev.org/openstack/networking-ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7?style=unified&whitespace=ignore-change

Comment 1 Dan Williams 2022-06-03 13:47:37 UTC
OpenShift also ran into this because originally it didn't use exclude-lb-vips-from-garp=true, leading to [Service VIP * nodes] MAC bindings in SB. So I think this would be useful for both OCP and OSP.

Comment 2 Ales Musil 2022-06-06 13:53:33 UTC
After discussion we have came up with couple of possible solutions:

1) Add column to MAC_Binding "idle_age" that would be updated by ovn-controllers based on "idle_age" of particular physical flow statistic.
That has probably one major scale drawback and that's there would be a lot of updates to that "idle_age" column on large envs. 

2) Add action that would clear/bump timer local to ovn-controller that could be installed to the table 66/67 with every MAC binding flow. 
The drawback here could be a lot of calls to ovn-controller action when there is a lot of traffic going on on large envs. 

3) Add column for the "owner" of the MAC_Binding row, the owner (ovn-controller that created the row) would be responsible for checking "idle_age" timer. 
The controller could check "idle_age" only locally without sending any updates to SB database. The main issue is that, if the datapath is distributed over
multiple controllers we could, effectively delete MAC binding from other controllers even when they are still using it. The controller would be able to
recreate it, but it could cause some delays. 

From all the suggested solutions the 3) looks most promising however it still needs to be tested to exclude any possible performance regressions.

Comment 3 Ales Musil 2022-06-14 13:50:57 UTC
First iteration posted: https://patchwork.ozlabs.org/project/ovn/list/?series=304732

Comment 6 Ales Musil 2022-07-01 10:05:04 UTC
Created attachment 1893876 [details]
results.pdf

Results of measurement with Xena system


Note You need to log in before you can comment on or make changes to this bug.