Description of problem: With ovn-kubernetes, we had to add back checking for OVS flows in CNI: https://github.com/ovn-org/ovn-kubernetes/commit/22bed6a10c669142fb13e612a8fe17cf8b895fd6 This is because we do not use unique identifiers for our LSP names, so a pod X could be deleted, recreated, and CNI can add the port before the LSP has been deleted/recreated in OVN. ovn-controller will consider the pod as "ovn-installed" even though its an older version. The request here is to be able to pass a unique identifier to the LSP somehow, so that ovn-controller when it writes "ovn-installed" will also write this identifier into ovsdb. Then ovnkube-node can check the value of this identifier in ovsdb to know that this ovn-installed is accurate. An alternative method would be the CMS provides some key value pair that will be present for the port in ovsdb. Then ovn-controller only sets ovn-installed if that key matches. For example, if our CNI sets "attached_mac=11:22:...", and we set that on the LSP. Then ovn-controller would examine the interface in ovsdb, determine if it has the proper key/value identifier before setting ovn-installed.
Tagging perfscale-ovn here as we see during pod latency tests that querying ovs-flows can take up to 8 seconds.
Patch submitted for review u/s - https://patchwork.ozlabs.org/project/ovn/patch/20210818213511.3076974-1-numans@ovn.org/
*** Bug 1978719 has been marked as a duplicate of this bug. ***
Fixed in version ovn21.09-21.09.0-15
Verified on: [root@dell-per740-81 ~]# rpm -qa | grep -E 'ovn|openvswitch' ovn-2021-host-21.09.0-12.el8fdp.x86_64 openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch openvswitch2.15-2.15.0-26.el8fdp.x86_64 ovn-2021-21.09.0-12.el8fdp.x86_64 ovn-2021-central-21.09.0-12.el8fdp.x86_64 systemctl start openvswitch systemctl start ovn-northd ovn-nbctl set-connection ptcp:6641 ovn-sbctl set-connection ptcp:6642 ovs-vsctl set open . external_ids:system-id=hv1 external_ids:ovn-remote=tcp:42.42.42.1:6642 external_ids:ovn-encap-type=geneve external_ids:ovn-encap-ip=42.42.42.1 systemctl restart ovn-controller ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ls1p1 ovn-nbctl lsp-set-addresses ls1p1 "00:00:00:01:01:01 192.168.1.1 2001::1" ovn-nbctl lsp-set-options ls1p1 iface-id-ver=foo ovn-nbctl lsp-add ls1 ls1p2 ovn-nbctl --wait=hv lsp-set-addresses ls1p2 "00:00:00:01:01:02 192.168.1.2 2001::2" ovs-vsctl add-port br-int ls1p1 ovs-vsctl set interface ls1p1 external_ids:iface-id=ls1p1 ovn-nbctl --wait=hv sync ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: ======> No binding shown ovs-vsctl set interface ls1p1 type=internal ovn-nbctl --wait=hv sync ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [0] ---------------------------------------- ======> ls1p1 still not been claimed ovs-vsctl set interface ls1p1 external_ids:iface-id-ver=foo ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [1] primary lport : [ls1p1] ---------------------------------------- ======> ovn-controller has claimed ls1p1 now. Binding shown ovs-vsctl remove interface ls1p1 external_ids iface-id-ver ovn-nbctl --wait=hv sync ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [0] ---------------------------------------- =======> binding released ovn-nbctl clear logical_switch_port ls1p1 options ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [1] primary lport : [ls1p1] ---------------------------------------- =======> clearing the iface-id-ver option and binding claimed again ovn-nbctl lsp-set-options ls1p1 iface-id-ver=bar ovn-nbctl --wait=hv sync ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [0] ---------------------------------------- ovs-vsctl set interface ls1p1 external_ids:iface-id-ver=bar ovn-nbctl --wait=hv sync ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [1] primary lport : [ls1p1] ---------------------------------------- ovs-vsctl set interface ls1p1 external_ids:iface-id-ver=bar2 ovn-nbctl --wait=hv sync ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [0] ---------------------------------------- =========> Setting the options:iface-id-ver to ls1p1 with different value, released the binding ovn-nbctl lsp-set-type ls1p1 localport ovn-nbctl --wait=hv sync ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [1] localport lport : [ls1p1] ---------------------------------------- =========> iface-id-ver option is ignored for localports ovs-vsctl add-port br-int ls1p2 ovs-vsctl set interface ls1p2 external_ids:iface-id=ls1p2 ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [1] localport lport : [ls1p1] ---------------------------------------- ovs-vsctl set interface ls1p2 type=internal ovn-appctl -t ovn-controller debug/dump-local-bindings Local bindings: name: [ls1p1], OVS interface name : [ls1p1], num binding lports : [1] localport lport : [ls1p1] ---------------------------------------- name: [ls1p2], OVS interface name : [ls1p2], num binding lports : [1] primary lport : [ls1p2] ---------------------------------------- ========> The default behaviour is that ovn-controller always claims for port binding until unless specified any value of iface-id-ver using lsp-set-options for lsp as well as external_ids option for vif.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:5059