Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2212315

Summary: Add MAC binding timestamp refresh mechanism
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Ales Musil <amusil>
Component: ovn23.09Assignee: Ales Musil <amusil>
Status: CLOSED ERRATA QA Contact: Ehsan Elahi <eelahi>
Severity: unspecified Docs Contact:
Priority: high    
Version: FDP 22.ECC: ctrautma, jiji, jishi, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovn23.09-23.09.0-alpha.133.el9fdp Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2024-01-24 11:17:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ales Musil 2023-06-05 09:38:32 UTC
Description of problem:
Currently the Mac binding aging removes rows that are still relevant and that can cause short traffic disruption. In order to avoid that add mechanism that will refresh the timestamp when the MAC binding is still in use.

Initial proposal for the implementation was started as ovs-discuss thread [0].

The following approach was decided to be taken:

Add "mac_cache_use" action into "lr_in_learn_neighbor" table (only the flow that continues on known MAC binding):
match=(REGBIT_LOOKUP_NEIGHBOR_RESULT == 1 || REGBIT_LOOKUP_NEIGHBOR_IP_RESULT == 0), action=(next;)  -> match=(REGBIT_LOOKUP_NEIGHBOR_RESULT == 1 || REGBIT_LOOKUP_NEIGHBOR_IP_RESULT == 0), action=(mac_cache_use; next;)

The "mac_cache_use" would translate to resubmit into separate table with flows per MAC binding as follows:
match=(ip.src=<MB_IP>, eth.src=<MB_MAC>, datapath=<MB_Datapath>), action=(drop;)

This should bump the statistics every time for the correct MAC binding. In ovn-controller we could periodically dump the flows from this table. the period would be set to MIN(mac_binding_age_threshold * 3/4) from all local datapaths. The dump would happen from a different thread with its own rconn to prevent backlogging issues. The thread would receive mapped data from I-P node that would keep track of mapping datapath -> cookies -> mac bindings. This allows us to avoid constant lookups, but at the cost of keeping track of all local MAC bindings. To save some computation time this I-P could be relevant only for datapaths that actually have the threshold set.

If the "idle_age" of the particular flow is smaller than the datapath "mac_binding_age_threshold" it means that it is still in use. To prevent a lot of updates, if the traffic is still relevant on multiple controllers, we would check if the timestamp is older than the "dump period"; if not we don't have to update it, because someone else did.

Also to "desync" the controllers there would be a random delay added to the "dump period". 

[0] https://mail.openvswitch.org/pipermail/ovs-discuss/2023-May/052475.html

Comment 2 Ales Musil 2023-07-13 13:40:04 UTC
Patches posted: https://patchwork.ozlabs.org/project/ovn/list/?series=363141

Comment 6 Ehsan Elahi 2023-12-08 16:41:48 UTC
Verified on:

[root@dell-per740-81 ~]# rpm -qa | grep -E 'ovn|openvswitch'
openvswitch-selinux-extra-policy-1.0-34.el9fdp.noarch
openvswitch2.17-2.17.0-125.el9fdp.x86_64
ovn23.09-23.09.0-87.el9fdp.x86_64
ovn23.09-host-23.09.0-87.el9fdp.x86_64
ovn23.09-central-23.09.0-87.el9fdp.x86_64

Here is the test script:

systemctl start ovn-northd
ovn-nbctl set-connection ptcp:6641
ovn-sbctl set-connection ptcp:6642
systemctl start openvswitch
ovs-vsctl set open . external_ids:system-id=hv1
#ip a
ifconfig ens1f0 42.42.42.1 netmask 255.0.0.0
ovs-vsctl set open . external_ids:ovn-remote=tcp:42.42.42.1:6642
ovs-vsctl set open . external_ids:ovn-encap-type=geneve
ovs-vsctl set open . external_ids:ovn-encap-ip=42.42.42.1

systemctl start ovn-controller
ovn-nbctl lr-add rtr
ovn-nbctl lrp-add rtr rtr-ls 00:00:00:00:01:00 42.42.42.1/24 2000::1/64
ovn-nbctl lrp-add rtr rtr-ls2 00:00:00:00:02:00 77.77.77.1/24 2002::1/64

ovn-nbctl ls-add ls
ovn-nbctl lsp-add ls ls-rtr
ovn-nbctl lsp-set-addresses ls-rtr 00:00:00:00:01:00
ovn-nbctl lsp-set-type ls-rtr router
ovn-nbctl lsp-set-options ls-rtr router-port=rtr-ls
ovn-nbctl lsp-add ls vm1
ovn-nbctl lsp-set-addresses vm1 00:00:00:00:00:01

ovn-nbctl ls-add ls2
ovn-nbctl lsp-add ls2 ls2-rtr
ovn-nbctl lsp-set-addresses ls2-rtr 00:00:00:00:02:00
ovn-nbctl lsp-set-type ls2-rtr router
ovn-nbctl lsp-set-options ls2-rtr router-port=rtr-ls2
ovn-nbctl lsp-add ls2 vm2
ovn-nbctl lsp-set-addresses vm2 00:00:00:00:00:02
ip netns add vm1
ovs-vsctl add-port br-int vm1 -- set interface vm1 type=internal
ip link set vm1 netns vm1
ip netns exec vm1 ip link set vm1 address 00:00:00:00:00:01
ip netns exec vm1 ip addr add 42.42.42.2/24 dev vm1
ip netns exec vm1 ip -6 addr add 2000::2/64 dev vm1
ip netns exec vm1 ip link set vm1 up
ip netns exec vm1 ip route add default via 42.42.42.1
ip netns exec vm1 ip -6 route add default via 2000::1
ovs-vsctl set Interface vm1 external_ids:iface-id=vm1
ip netns add vm2
ovs-vsctl add-port br-int vm2 -- set interface vm2 type=internal
ip link set vm2 netns vm2
ip netns exec vm2 ip link set vm2 address 00:00:00:00:00:02
ip netns exec vm2 ip addr add 77.77.77.2/24 dev vm2
ip netns exec vm2 ip -6 addr add 2002::2/64 dev vm2
ip netns exec vm2 ip link set vm2 up
ip netns exec vm2 ip link set lo up
ip netns exec vm2 ip route add default via 77.77.77.1
ip netns exec vm2 ip -6 route add default via 2002::1
ovs-vsctl set Interface vm2 external_ids:iface-id=vm2

ip netns exec vm1 ping 77.77.77.2 -c 3
[root@dell-per740-81 ~]# ovn-sbctl list mac_binding
_uuid               : e71a4344-d6b3-4547-8549-d989cc6be74a
datapath            : d8737718-371b-4096-96a7-0c1370d2f474
ip                  : "77.77.77.2"
logical_port        : rtr-ls2
mac                 : "00:00:00:00:00:02"
timestamp           : 1702052865714

_uuid               : 354eba92-2a09-4551-843e-95ee89be2778
datapath            : d8737718-371b-4096-96a7-0c1370d2f474
ip                  : "42.42.42.2"
logical_port        : rtr-ls
mac                 : "00:00:00:00:00:01"
timestamp           : 1702052865714
[root@dell-per740-81 ~]# ovs-ofctl dump-flows br-int table=79 | grep "77.77.77.2"
 cookie=0xe71a4344, duration=3.794s, table=79, n_packets=0, n_bytes=0, idle_age=3, priority=100,ip,reg14=0x2,metadata=0x1,dl_src=00:00:00:00:00:02,nw_src=77.77.77.2 actions=drop

ovn-nbctl set logical_router rtr options:mac_binding_age_threshold=5
sleep 5
[root@dell-per740-81 ~]# ovn-sbctl list mac_binding
[root@dell-per740-81 ~]# ovs-ofctl dump-flows br-int table=79 | grep "77.77.77.2"

<=================== MAC binding and flow removed after 5 seconds of inactivity =====================

Comment 8 errata-xmlrpc 2024-01-24 11:17:49 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (ovn23.09 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2024:0392