2072710 – Perfscale - pods time out waiting for OVS port binding (ovn-installed)

Bug 2072710 - Perfscale - pods time out waiting for OVS port binding (ovn-installed)

Summary: Perfscale - pods time out waiting for OVS port binding (ovn-installed)

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.11.0
Assignee:	Surya Seetharaman
QA Contact:	Mike Fiedler
Docs Contact:
URL:
Whiteboard:	perfscale-ovn
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2022-04-06 19:58 UTC by Mohit Sheth
Modified:	2022-08-10 11:04 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2022-08-10 11:04:00 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift cluster-network-operator pull 1386	0	None	open	Bug 2072710: Make northd probe interval default to 10 seconds	2022-04-20 07:36:03 UTC

Description Mohit Sheth 2022-04-06 19:58:10 UTC

Description of problem:
While running router test (1600 pods - each backed by a svc and a route) on a 120 node Baremetal cluster we see that the pods are not able to come up and stuck in ContainerCreating state with the following error

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-burner-fa0990f2-6sssg_benchmark-operator_5d59d617-a691-41a1-bf0c-29dcc35a9de4_0(b1f02d91f89801bf668a832ec5e008ee0e94f50924586753ee049cd60a8ffda5): error adding pod benchmark-operator_kube-burner-fa0990f2-6sssg to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [benchmark-operator/kube-burner-fa0990f2-6sssg/5d59d617-a691-41a1-bf0c-29dcc35a9de4:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[benchmark-operator/kube-burner-fa0990f2-6sssg b1f02d91f89801bf668a832ec5e008ee0e94f50924586753ee049cd60a8ffda5] [benchmark-operator/kube-burner-fa0990f2-6sssg b1f02d91f89801bf668a832ec5e008ee0e94f50924586753ee049cd60a8ffda5] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:1a:0c [10.131.26.12/23]

Upon looking at SBDB logs we see
05T19:29:38.402Z|39040|timeval|WARN|Unreasonably long 12975ms poll interval (12725ms user, 168ms system)

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-27-140854

How reproducible:
Not sure

Steps to Reproduce:
1. Run a scale workload which creates pods, svc and routes at 20 QPS

Actual results:
Pods stuck at ContainerCreating with the above error

Expected results:
All the pods should be up and running

Comment 5 Mike Fiedler 2022-05-09 20:32:15 UTC

@msheth Any chance your team can verify this on 4.11?

Comment 6 Mohit Sheth 2022-05-10 13:50:13 UTC

Hey,I have not come across this in our CI for a while. 
Marking it verified, thank you

Comment 8 Surya Seetharaman 2022-06-21 15:31:35 UTC

Note that actual fix is via https://github.com/openshift/cluster-network-operator/pull/1494,
the first fix linked in the bug was wrong, my bad.

Comment 9 errata-xmlrpc 2022-08-10 11:04:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Note You need to log in before you can comment on or make changes to this bug.