Bug 2072710 - Perfscale - pods time out waiting for OVS port binding (ovn-installed)
Summary: Perfscale - pods time out waiting for OVS port binding (ovn-installed)
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Surya Seetharaman
QA Contact: Mike Fiedler
URL:
Whiteboard: perfscale-ovn
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-06 19:58 UTC by Mohit Sheth
Modified: 2022-08-10 11:04 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-08-10 11:04:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-network-operator pull 1386 0 None open Bug 2072710: Make northd probe interval default to 10 seconds 2022-04-20 07:36:03 UTC

Description Mohit Sheth 2022-04-06 19:58:10 UTC
Description of problem:
While running router test (1600 pods - each backed by a svc and a route) on a 120 node Baremetal cluster we see that the pods are not able to come up and stuck in ContainerCreating state with the following error

Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_kube-burner-fa0990f2-6sssg_benchmark-operator_5d59d617-a691-41a1-bf0c-29dcc35a9de4_0(b1f02d91f89801bf668a832ec5e008ee0e94f50924586753ee049cd60a8ffda5): error adding pod benchmark-operator_kube-burner-fa0990f2-6sssg to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [benchmark-operator/kube-burner-fa0990f2-6sssg/5d59d617-a691-41a1-bf0c-29dcc35a9de4:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[benchmark-operator/kube-burner-fa0990f2-6sssg b1f02d91f89801bf668a832ec5e008ee0e94f50924586753ee049cd60a8ffda5] [benchmark-operator/kube-burner-fa0990f2-6sssg b1f02d91f89801bf668a832ec5e008ee0e94f50924586753ee049cd60a8ffda5] failed to configure pod interface: timed out waiting for OVS port binding (ovn-installed) for 0a:58:0a:83:1a:0c [10.131.26.12/23]

Upon looking at SBDB logs we see
05T19:29:38.402Z|39040|timeval|WARN|Unreasonably long 12975ms poll interval (12725ms user, 168ms system)

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-27-140854

How reproducible:
Not sure

Steps to Reproduce:
1. Run a scale workload which creates pods, svc and routes at 20 QPS

Actual results:
Pods stuck at ContainerCreating with the above error

Expected results:
All the pods should be up and running

Comment 5 Mike Fiedler 2022-05-09 20:32:15 UTC
@msheth Any chance your team can verify this on 4.11?

Comment 6 Mohit Sheth 2022-05-10 13:50:13 UTC
Hey,I have not come across this in our CI for a while. 
Marking it verified, thank you

Comment 8 Surya Seetharaman 2022-06-21 15:31:35 UTC
Note that actual fix is via https://github.com/openshift/cluster-network-operator/pull/1494,
the first fix linked in the bug was wrong, my bad.

Comment 9 errata-xmlrpc 2022-08-10 11:04:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069


Note You need to log in before you can comment on or make changes to this bug.