Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2084668

Summary: [ovn] Aging mechanism for MAC_Binding entries
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: OVNAssignee: Ales Musil <amusil>
Status: CLOSED CURRENTRELEASE QA Contact: Ehsan Elahi <eelahi>
Severity: medium Docs Contact:
Priority: high    
Version: FDP 22.LCC: amusil, ctrautma, dcbw, dceara, jiji, jishi, jlibosva, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-13 07:18:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 2078986, 2209893    
Attachments:
Description Flags
results.pdf none

Description Daniel Alvarez Sanchez 2022-05-12 16:00:55 UTC
In OpenStack, we have been doing some tricks in the past to workaround the limitation of MAC_Binding entries not expiring.

Some of those tricks involve not monitoring the MAC_Binding table at all to avoid OOM killers [0] or delete the entries upon association/disassociation of a Floating IP [1].

Ideally, old (or better, unused) entries should be deleted helping reduce the size of the database but also avoiding issues when reusing IP addresses.

Link to the original upstream discussion: https://mail.openvswitch.org/pipermail/ovs-discuss/2019-June/048936.html



[0] https://opendev.org/openstack/neutron/commit/f6c35527698119ee6f73a6a3613c9beebb563840
[1] https://opendev.org/openstack/networking-ovn/commit/5181f1106ff839d08152623c25c9a5f6797aa2d7?style=unified&whitespace=ignore-change

Comment 1 Dan Williams 2022-06-03 13:47:37 UTC
OpenShift also ran into this because originally it didn't use exclude-lb-vips-from-garp=true, leading to [Service VIP * nodes] MAC bindings in SB. So I think this would be useful for both OCP and OSP.

Comment 2 Ales Musil 2022-06-06 13:53:33 UTC
After discussion we have came up with couple of possible solutions:

1) Add column to MAC_Binding "idle_age" that would be updated by ovn-controllers based on "idle_age" of particular physical flow statistic.
That has probably one major scale drawback and that's there would be a lot of updates to that "idle_age" column on large envs. 

2) Add action that would clear/bump timer local to ovn-controller that could be installed to the table 66/67 with every MAC binding flow. 
The drawback here could be a lot of calls to ovn-controller action when there is a lot of traffic going on on large envs. 

3) Add column for the "owner" of the MAC_Binding row, the owner (ovn-controller that created the row) would be responsible for checking "idle_age" timer. 
The controller could check "idle_age" only locally without sending any updates to SB database. The main issue is that, if the datapath is distributed over
multiple controllers we could, effectively delete MAC binding from other controllers even when they are still using it. The controller would be able to
recreate it, but it could cause some delays. 

From all the suggested solutions the 3) looks most promising however it still needs to be tested to exclude any possible performance regressions.

Comment 3 Ales Musil 2022-06-14 13:50:57 UTC
First iteration posted: https://patchwork.ozlabs.org/project/ovn/list/?series=304732

Comment 6 Ales Musil 2022-07-01 10:05:04 UTC
Created attachment 1893876 [details]
results.pdf

Results of measurement with Xena system