Description of problem: Currently when adding logical switch ports Openshift assumes that the changes to the ovn databases get propagated on the node before we start to use them. We would like a way to ensure that the flows for a logical switch port are created on the node where the newly created port is bound
To clarify, OpenShift is aware of the `--wait=hv` option for ovn-nbctl commands. The problem with it is that it: 1) Waits for the change to reach all nodes, not just a single specific node. 2) Waits for the northbound database to acknowledge that the change has reached all nodes.
I've done some thinking about this, and I think we could solve this with a nbdb revision number that is propagated from nbdb down to host ovn-controller. Essentially, I'd imagine something like this: ovn-nbctl -- lsp-add .... -- sync --wait=revision which emits a revision number for the proposed change. Then, we would add that to our Kubernetes state, which is how we communicate to the host-level components. The host would then watch the revision number applied by ovn-controller, and once it is >= to the pod's revision returned from the add command, the pod is up.
Using ovn-nbctl doesn't solve the problem for us. The CNI/ovnkube-node part of our software does not query OVN NB, and we do not want it to do that (especially on each pod add) for performance reasons. The use case here is we need to be sure that all of the flows for a particular pod are programmed into OVS, before we return in CNI that the add was successful. One way to do this is to expose some ovn-controller api to allow the co-located ovnkube-node process to query ovn-controller for if a port is wired. Another way is to add a field to the db schema for port_binding in SBDB to indicate the port is fully wired (all openflow flows exist). The second option might not be as appealing because then every CNI call we are making SBDB read, which might hurt the already fragile SBDB performance.
We are seeing timing issues related to this in network policy E2E stress tests. The details are captured here: https://bugzilla.redhat.com/show_bug.cgi?id=1890436#c2 The summary is that network policy flows for a pod take some time to be installed in OVS worker nodes, especially in a busy system, like E2E network stress tests. We need a way to ensure that all flows for a pod are installed before we indicate success in CNI, as Tim describes above.
(In reply to Casey Callendrello from comment #2) > I've done some thinking about this, and I think we could solve this with a > nbdb revision number that is propagated from nbdb down to host > ovn-controller. Essentially, I'd imagine something like this: > > ovn-nbctl -- lsp-add .... -- sync --wait=revision > > which emits a revision number for the proposed change. Then, we would add > that to our Kubernetes state, which is how we communicate to the host-level > components. > > The host would then watch the revision number applied by ovn-controller, and > once it is >= to the pod's revision returned from the add command, the pod > is up. One way of achieving this with current OVN code is to set NB_Global.nb_cfg to <revision> and then check that the node we're interested in has updated its corresponding Chassis_Private.nb_cfg field (which means it installed all OVS flows generated by <revision>). Something like: # Perform NB changes (e.g., lsp-add). # Set NB_Global.nb_cfg to a ovn-k8s specific revision: ovn-nbctl set nb_global . nb_cfg 42 # Wait until the SB Chasiss_Private record is updated to nb_cfg=42, i.e.: ovn-sbctl --bare --columns nb_cfg find chassis_private name=local 42 We could enhance the ovn-controller code to also store the nb_cfg value in an external-id in the OVS DB on the node itself. This would allow readiness checks that run on the node to avoid connecting to the SB. Would that work for ovn-k8s? Thanks, Dumitru
Hey Dumitru, Thanks, that's basically what I was expecting. Yes, we would really like ovn-controller to push this down to the node, so we have an end-to-end confirmation that everything is set up. Basically, the way it works now is we wait for certain flows to show up in the node ovsdb for confirmation the pod is up. But that is fragile at best. Much better is an explicit version. It would be really cool (but probably too much work) for ovn-northd to update this value itself in the nbdb. That way we could add the pod, then get the latest revision (ideally in the same ovsdb transaction, to avoid races). I'm not sure if the ovsdb protocol supports this.
(In reply to Casey Callendrello from comment #7) [...] > > It would be really cool (but probably too much work) for ovn-northd to > update this value itself in the nbdb. That way we could add the pod, then > get the latest revision (ideally in the same ovsdb transaction, to avoid > races). I'm not sure if the ovsdb protocol supports this. The problem with this is that the NB version of the hv_cfg "revision" is unique for the whole cluster, while each node updates its own per-chassis version in the SB DB. The way it works now is that ovn-northd sets the NB.NB_Global.hv_cfg to the minimum value of all SB.Chassis_Private.nb_cfg record values. If a chassis record is stale or the chassis node has gone unresponsive NB.NB_Global.hv_cfg will never get updated. I don't think we can implement what you're suggesting above without duplicating SB.Chassis_Private records to the NB DB (which probably won't be accepted upstream).
Code sent upstream for review: http://patchwork.ozlabs.org/project/ovn/list/?series=223036&state=*
V2 posted upstream: http://patchwork.ozlabs.org/project/ovn/list/?series=223060
Verified the function based on the test script from github, here, https://github.com/ovn-org/ovn/blob/master/tests/ovn.at Tests under "AT_SETUP([ovn -- propagate Port_Binding.up to NB and OVS])" RPMs: [root@wsfd ~]# rpm -qa | egrep "ovn|openv" openvswitch-selinux-extra-policy-1.0-28.el8fdp.noarch openvswitch2.13-2.13.0-79.5.el8fdp.x86_64 ovn2.13-host-20.12.0-17.el8fdp.x86_64 ovn2.13-central-20.12.0-17.el8fdp.x86_64 ovn2.13-20.12.0-17.el8fdp.x86_64 [root@wsfd ~]# ~~~~~~~~~~~~~~~~ [root@wsfd ~]# systemctl start openvswitch [root@wsfd ~]# systemctl start ovn-northd [root@wsfd ~]# [root@wsfd ~]# ovn-nbctl set-connection ptcp:6641 [root@wsfd ~]# ovn-sbctl set-connection ptcp:6642 [root@wsfd ~]# [root@wsfd ~]# ovs-vsctl set open . external-ids:system-id=hv1 external-ids:ovn-remote=tcp:11.1.35.1:6642 external-ids:ovn-encap-type=geneve external-ids:ovn-encap-ip=11.1.35.1 [root@wsfd ~]# [root@wsfd ~]# systemctl restart ovn-controller [root@wsfd ~]# ovn-nbctl ls-add ls [root@wsfd ~]# [root@wsfd ~]# ##### Case 1 add OVS port for existing LSP [root@wsfd ~]# ovn-nbctl lsp-add ls lsp1 [root@wsfd ~]# ovn-nbctl --wait=hv sync [root@wsfd ~]# ovn-nbctl lsp-get-up lsp1 #check lsp1 down? down [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 #check lsp1 up = “false”? _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : [] datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : false virtual_parent : [] [root@wsfd ~]# ovs-vsctl add-port br-int lsp1 -- set Interface lsp1 type=internal external-ids:iface-id=lsp1 [root@wsfd ~]# ovs-vsctl list interface lsp1 _uuid : d39a8ec6-ab9b-4a7b-950c-8ec8f3fc849a admin_state : down bfd : {} bfd_status : {} cfm_fault : [] cfm_fault_status : [] cfm_flap_count : [] cfm_health : [] cfm_mpid : [] cfm_remote_mpids : [] cfm_remote_opstate : [] duplex : [] error : [] external_ids : {iface-id=lsp1, ovn-installed="true"} ifindex : 32 ingress_policing_burst: 0 ingress_policing_rate: 0 lacp_current : [] link_resets : 0 link_speed : [] link_state : down lldp : {} mac : [] mac_in_use : "a6:c6:1a:70:62:d6" mtu : 1500 mtu_request : [] name : lsp1 ofport : 1 ofport_request : [] options : {} other_config : {} statistics : {collisions=0, rx_bytes=0, rx_crc_err=0, rx_dropped=0, rx_errors=0, rx_frame_err=0, rx_missed_errors=0, rx_over_err=0, rx_packets=0, tx_bytes=0, tx_dropped=0, tx_errors=0, tx_packets=0} status : {driver_name=openvswitch} type : internal [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 #check lsp1 up = “true”? _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : 9068586d-af6c-43cd-8525-a2f6a406f910 datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : true virtual_parent : [] [root@wsfd ~]# ovn-nbctl lsp-get-up lsp1 #check lsp1 up? up [root@wsfd ~]# ovs-vsctl get Interface lsp1 external_ids:ovn-installed #check true? "true" [root@wsfd ~]# [root@wsfd ~]# ##### Case 2 add LSP for existing OVS port [root@wsfd ~]# ovs-vsctl add-port br-int lsp2 -- set Interface lsp2 type=internal external-ids:iface-id=lsp2 [root@wsfd ~]# ovs-vsctl get Interface lsp2 external_ids:ovn-installed #check false? ovs-vsctl: no key "ovn-installed" in Interface record "lsp2" column external_ids [root@wsfd ~]# [root@wsfd ~]# ovn-nbctl lsp-add ls lsp2 [root@wsfd ~]# ovn-nbctl --wait=hv sync [root@wsfd ~]# ovn-sbctl list Port_Binding lsp2 #check lsp2 up = “true”? _uuid : 24254622-a6e6-403b-aeb5-d202a85e5d4a chassis : 9068586d-af6c-43cd-8525-a2f6a406f910 datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp2 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 2 type : "" up : true virtual_parent : [] [root@wsfd ~]# ovn-nbctl lsp-get-up lsp2 #check lsp2 up? up [root@wsfd ~]# ovs-vsctl get Interface lsp2 external_ids:ovn-installed #check true? "true" [root@wsfd ~]# [root@wsfd ~]# ##### Case 3 ovn-controller should not reset Port_Binding.up without northd [root@wsfd ~]# [root@wsfd ~]# # Pause northd and clear the "up" field to simulate older ovn-northd [root@wsfd ~]# # versions writing to the Southbound DB. [root@wsfd ~]# ovn-appctl -t ovn-northd pause [root@wsfd ~]# ovn-appctl -t ovn-controller debug/pause [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 # check up = true? _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : 9068586d-af6c-43cd-8525-a2f6a406f910 datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : true virtual_parent : [] [root@wsfd ~]# ovn-sbctl clear Port_Binding lsp1 up [root@wsfd ~]# ovn-sbctl clear Port_Binding lsp1 chassis [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 # check up = NULL? _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : [] datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : [] virtual_parent : [] [root@wsfd ~]# [root@wsfd ~]# ovn-appctl -t ovn-controller debug/resume [root@wsfd ~]# [root@wsfd ~]# [root@wsfd ~]# # Forcefully release the Port_Binding so ovn-controller reclaims it. [root@wsfd ~]# # Make sure the Port_Binding.up field is not updated though. [root@wsfd ~]# [root@wsfd ~]# #ovn-sbctl clear Port_Binding lsp1 chassis [root@wsfd ~]# hv1_uuid=$(ovn-sbctl --bare --columns _uuid find Chassis "name=hv1" | sort) [root@wsfd ~]# echo $hv1_uuid 9068586d-af6c-43cd-8525-a2f6a406f910 [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 #check up= NULL? And chassis = $hv1_uuid _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : 9068586d-af6c-43cd-8525-a2f6a406f910 datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : [] virtual_parent : [] [root@wsfd ~]# [root@wsfd ~]# # Once northd should explicitly set the Port_Binding.up field to 'false' and [root@wsfd ~]# # ovn-controller sets it to 'true' as soon as the update is processed. [root@wsfd ~]# [root@wsfd ~]# ovn-appctl -t ovn-northd resume [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 # check up = true? _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : 9068586d-af6c-43cd-8525-a2f6a406f910 datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : true virtual_parent : [] [root@wsfd ~]# ovn-nbctl lsp-get-up lsp1 #check lsp1 up? up [root@wsfd ~]# [root@wsfd ~]# ##### Case 4 ovn-controller should reset Port_Binding.up - from NULL [root@wsfd ~]# # If Port_Binding.up is cleared externally, ovn-northd resets it to 'false' [root@wsfd ~]# # and ovn-controller finally sets it to 'true' once the update is processed. [root@wsfd ~]# ovn-appctl -t ovn-controller debug/pause [root@wsfd ~]# ovn-sbctl clear Port_Binding lsp1 up [root@wsfd ~]# ovn-nbctl --wait=sb sync [root@wsfd ~]# [root@wsfd ~]# [root@wsfd ~]# ovn-nbctl lsp-get-up lsp1 #check down? down [root@wsfd ~]# ovn-appctl -t ovn-controller debug/resume [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 # check up = true? _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : 9068586d-af6c-43cd-8525-a2f6a406f910 datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : true virtual_parent : [] [root@wsfd ~]# ovn-nbctl lsp-get-up lsp1 #check up? up [root@wsfd ~]# [root@wsfd ~]# ##### Case 5: ovn-controller should reset Port_Binding.up - from false [root@wsfd ~]# # If Port_Binding.up is externally set to 'false', ovn-controller should sets [root@wsfd ~]# # it to 'true' once the update is processed. [root@wsfd ~]# ovn-appctl -t ovn-controller debug/pause [root@wsfd ~]# ovn-sbctl set Port_Binding lsp1 up=false [root@wsfd ~]# ovn-nbctl --wait=sb sync [root@wsfd ~]# ovn-nbctl lsp-get-up lsp1 #check down? down [root@wsfd ~]# [root@wsfd ~]# ovn-appctl -t ovn-controller debug/resume [root@wsfd ~]# ovn-sbctl list Port_Binding lsp1 # check up = true? _uuid : 09eb4764-74f1-4140-be23-a53b4d95af1b chassis : 9068586d-af6c-43cd-8525-a2f6a406f910 datapath : 5d5f0d02-71f9-44ac-ad38-3e7053102523 encap : [] external_ids : {} gateway_chassis : [] ha_chassis_group : [] logical_port : lsp1 mac : [] nat_addresses : [] options : {} parent_port : [] tag : [] tunnel_key : 1 type : "" up : true virtual_parent : [] [root@wsfd ~]# ovn-nbctl lsp-get-up lsp1 #check up? up [root@wsfd ~]#
*** Bug 1889463 has been marked as a duplicate of this bug. ***
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (ovn2.13 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:0836
This comment was flagged a spam, view the edit history to see the original text if required.