1995335 – [SCALE] ovnkube CNI: remove ovs flows check

Bug 1995335 - [SCALE] ovnkube CNI: remove ovs flows check

Summary: [SCALE] ovnkube CNI: remove ovs flows check

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Nadia Pinaeva
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:	perfscale-ovn
Depends On:	1995333
Blocks:	2003161
TreeView+	depends on / blocked

Reported:	2021-08-18 20:39 UTC by Tim Rozet
Modified:	2022-03-10 16:05 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-03-10 16:05:18 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift ovn-kubernetes pull 729	0	None	None	None	2021-09-09 08:14:04 UTC
Red Hat Product Errata	RHSA-2022:0056	0	None	Closed	Apache HTTPD versions supported by Red Hat	2022-05-12 03:42:37 UTC

Description Tim Rozet 2021-08-18 20:39:17 UTC

Description of problem:
We see during scale testing just executing the ovs-ofctl dump-flows command may take 8 seconds, and adds to our pod latency time. We need to remove this in favor of only using ovn-installed.

Comment 1 Dan Williams 2021-08-20 12:50:01 UTC

Numan proposed this patch upstream: http://patchwork.ozlabs.org/project/ovn/patch/20210818213511.3076974-1-numans@ovn.org/

When that hits OVN (and if nobody bike-sheds on the key names) we'll need ovnkube-master to set "vif-id=<macaddr>" in the pod's Logical Switch Port "options" column, like we do the requested_chassis option.

Then ovnkube-node bits need to set "vif-id=<macaddr>" in the OVS Interface record, like we set iface-id already.

Then we can trust that when reading ovn-installed from OVS during container sandbox setup, that both OVN and ovnkube agree on which pod instance was actually set up.

Comment 2 Tim Rozet 2021-08-20 16:25:11 UTC

Adding some background here:
1. ovnkube-master is responsible for creating the logical switchport in OVN for a pod, while the ovnkube-node CNI is responsible for adding the veth to OVS
2. ovn-controller will recognize when the veth is added to the switch, and bind it to a matching logical switch port that it finds (which ovnkube-master configured)
3. In the past, ovn-controller would bind the port, but our CNI needed some way to know all of the openflow flows were present before we indicated the CNI add was complete. To do this we checked for specific openflow flows being present for the pod in OVS.
4. OVN team came up with a new indicator "ovn-installed", which ovn-controller writes into OVS ovsdb when it binds the port.
5. We switched to using this method, but then realized it had glaring issue, outlined here:
https://github.com/ovn-org/ovn-kubernetes/commit/22bed6a10c669142fb13e612a8fe17cf8b895fd6

6. In the above case, if a pod was deleted/recreated with the same name as a previous pod (stateful sets use the same name), ovn-controller might see an old logical switch port and consider the port bound, even though the flows are not there for the new port.
7. In order to work around this, I added back the flow checking in the above commit. However, we see during scale runs the ovs-ofctl dump-flows command takes up to 8 seconds. Hurting our pod latency.
8. OVN team is adding a way to pass a unique identifier, so that when ovnkube-master creates the logical port, it can add this ID. Then the CNI can also add this ID when it adds the port, and ovn-controller will only consider the port ovn-installed if this matches.
9. We need changes to ovnkube to pass this new ID during LSP creation, and in the CNI port add to OVS.
10. In order to verify it works, we can use a stateful set, add/delete/add. The stateful set pod can have an init container that will immediately try to curl github.com or something. Internet connectivity should work immediately.

Comment 5 Mike Fiedler 2021-09-14 21:01:37 UTC

Verified on  4.10.0-0.nightly-2021-09-14-070333  that the iface-id-ver option is there.   I was unable to verify any performance improvements.  I was unable to reproduce long dump-flows times while running cluster-density and:  

avg(rate(ovnkube_node_cni_request_duration_seconds_sum{command="ADD"}[2m]))

was not significantly different for a 100 node, 400 iteration cluster-density when comparing 4.9.latest to 4.10.latest

4.10 snippet from  ovn-sbctl  list Port_Binding on sbdb container

_uuid               : 4f9f2fff-c30e-479f-a252-c35b95c8458d
chassis             : 20be2355-e17c-499a-b602-25d7e6a42bee
datapath            : 5f77a8f6-2c2b-4db8-a488-eac20483d9cf
encap               : []
external_ids        : {namespace=node-density-f3766076-0695-4962-8f29-2bc9a9089abf, pod="true"}
gateway_chassis     : []
ha_chassis_group    : []
logical_port        : node-density-f3766076-0695-4962-8f29-2bc9a9089abf_node-density-92
mac                 : ["0a:58:0a:83:14:0f 10.131.20.15"]
nat_addresses       : []
options             : {iface-id-ver="28cabaa9-9340-4aad-8d76-3506a6bec573", requested-chassis=ip-10-0-197-151.us-east-2.compute.internal}
parent_port         : []
tag                 : []
tunnel_key          : 8
type                : ""
up                  : true
virtual_parent      : []

@npinaeva @trozet  Is this sufficient for verification?  or do we need to see more concrete evidence on the performance side.

Comment 6 Tim Rozet 2021-10-01 15:26:42 UTC

Hey Mike, it will only matter when the node is under heavy stress. The reduced amount of ovs-ofctl calls during CNI alleviates stress on OVS. I would say the performance benefit is already known and there is no need to validate it here. However, I would suggest for this bz that we verify that when a pod comes up from CNI ADD, it actually has internet access. Sai, has previously tested this by bringing up an init container which immediately tries to download a curl from github to test if the pod is truly wired for networking. I suggest trying that same test and ensure port_binding is working correctly. So maybe spin up 100 pods on a node all with init containers to curl something on the internet and make sure none of them fail.

Comment 7 Mike Fiedler 2021-10-04 18:09:29 UTC

Verified on 4.10.0-0.nightly-2021-10-02-095441 per comment 6 - pods can reach internet immediately upon startup.

Comment 10 errata-xmlrpc 2022-03-10 16:05:18 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.