Bug 1995335

Summary: [SCALE] ovnkube CNI: remove ovs flows check
Product: OpenShift Container Platform Reporter: Tim Rozet <trozet>
Component: NetworkingAssignee: Nadia Pinaeva <npinaeva>
Networking sub component: ovn-kubernetes QA Contact: Mike Fiedler <mifiedle>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: anbhat, astoycos, dcbw, mifiedle, npinaeva
Version: 4.8   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: perfscale-ovn
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:05:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1995333    
Bug Blocks: 2003161    

Description Tim Rozet 2021-08-18 20:39:17 UTC
Description of problem:
We see during scale testing just executing the ovs-ofctl dump-flows command may take 8 seconds, and adds to our pod latency time. We need to remove this in favor of only using ovn-installed.

Comment 1 Dan Williams 2021-08-20 12:50:01 UTC
Numan proposed this patch upstream: http://patchwork.ozlabs.org/project/ovn/patch/20210818213511.3076974-1-numans@ovn.org/

When that hits OVN (and if nobody bike-sheds on the key names) we'll need ovnkube-master to set "vif-id=<macaddr>" in the pod's Logical Switch Port "options" column, like we do the requested_chassis option.

Then ovnkube-node bits need to set "vif-id=<macaddr>" in the OVS Interface record, like we set iface-id already.

Then we can trust that when reading ovn-installed from OVS during container sandbox setup, that both OVN and ovnkube agree on which pod instance was actually set up.

Comment 2 Tim Rozet 2021-08-20 16:25:11 UTC
Adding some background here:
1. ovnkube-master is responsible for creating the logical switchport in OVN for a pod, while the ovnkube-node CNI is responsible for adding the veth to OVS
2. ovn-controller will recognize when the veth is added to the switch, and bind it to a matching logical switch port that it finds (which ovnkube-master configured)
3. In the past, ovn-controller would bind the port, but our CNI needed some way to know all of the openflow flows were present before we indicated the CNI add was complete. To do this we checked for specific openflow flows being present for the pod in OVS.
4. OVN team came up with a new indicator "ovn-installed", which ovn-controller writes into OVS ovsdb when it binds the port.
5. We switched to using this method, but then realized it had glaring issue, outlined here:
https://github.com/ovn-org/ovn-kubernetes/commit/22bed6a10c669142fb13e612a8fe17cf8b895fd6

6. In the above case, if a pod was deleted/recreated with the same name as a previous pod (stateful sets use the same name), ovn-controller might see an old logical switch port and consider the port bound, even though the flows are not there for the new port.
7. In order to work around this, I added back the flow checking in the above commit. However, we see during scale runs the ovs-ofctl dump-flows command takes up to 8 seconds. Hurting our pod latency.
8. OVN team is adding a way to pass a unique identifier, so that when ovnkube-master creates the logical port, it can add this ID. Then the CNI can also add this ID when it adds the port, and ovn-controller will only consider the port ovn-installed if this matches.
9. We need changes to ovnkube to pass this new ID during LSP creation, and in the CNI port add to OVS.
10. In order to verify it works, we can use a stateful set, add/delete/add. The stateful set pod can have an init container that will immediately try to curl github.com or something. Internet connectivity should work immediately.

Comment 5 Mike Fiedler 2021-09-14 21:01:37 UTC
Verified on  4.10.0-0.nightly-2021-09-14-070333  that the iface-id-ver option is there.   I was unable to verify any performance improvements.  I was unable to reproduce long dump-flows times while running cluster-density and:  

avg(rate(ovnkube_node_cni_request_duration_seconds_sum{command="ADD"}[2m]))

was not significantly different for a 100 node, 400 iteration cluster-density when comparing 4.9.latest to 4.10.latest

4.10 snippet from  ovn-sbctl  list Port_Binding on sbdb container

_uuid               : 4f9f2fff-c30e-479f-a252-c35b95c8458d
chassis             : 20be2355-e17c-499a-b602-25d7e6a42bee
datapath            : 5f77a8f6-2c2b-4db8-a488-eac20483d9cf
encap               : []
external_ids        : {namespace=node-density-f3766076-0695-4962-8f29-2bc9a9089abf, pod="true"}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : node-density-f3766076-0695-4962-8f29-2bc9a9089abf_node-density-92
mac                 : ["0a:58:0a:83:14:0f 10.131.20.15"]
nat_addresses       : []
options             : {iface-id-ver="28cabaa9-9340-4aad-8d76-3506a6bec573", requested-chassis=ip-10-0-197-151.us-east-2.compute.internal}
parent_port         : []
tag                 : []
tunnel_key          : 8
type                : ""
up                  : true
virtual_parent      : []

@npinaeva @trozet  Is this sufficient for verification?  or do we need to see more concrete evidence on the performance side.

Comment 6 Tim Rozet 2021-10-01 15:26:42 UTC
Hey Mike, it will only matter when the node is under heavy stress. The reduced amount of ovs-ofctl calls during CNI alleviates stress on OVS. I would say the performance benefit is already known and there is no need to validate it here. However, I would suggest for this bz that we verify that when a pod comes up from CNI ADD, it actually has internet access. Sai, has previously tested this by bringing up an init container which immediately tries to download a curl from github to test if the pod is truly wired for networking. I suggest trying that same test and ensure port_binding is working correctly. So maybe spin up 100 pods on a node all with init containers to curl something on the internet and make sure none of them fail.

Comment 7 Mike Fiedler 2021-10-04 18:09:29 UTC
Verified on 4.10.0-0.nightly-2021-10-02-095441 per comment 6 - pods can reach internet immediately upon startup.

Comment 10 errata-xmlrpc 2022-03-10 16:05:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056