Description of problem:
On running a benchmark tests to create 2k app pods(nginx) and routes on GCP openshift environment with OVNKubernetes CNO, the cluster routes like console, prometheus goes nonresponsive for few minutes and its takes longer(at least 13min) to access the new app routes and hence the test fails consistently. Observed this behavior frequently and spike in ovnkube-master CPU utilization to 6 cores as well.
Version-Release number of selected component (if applicable):
OCP 4.10.5 GA
Always on GCP
Steps to Reproduce:
1. Deploy a healthy cluster with atleast 24 worker nodes(8 cpu, 32G mem) on GCP using OVNKubernetes CNO
2. Run kube-burner workload to create 2k pods and routes across multiple namespace
3. Watch the console or prometheus routes times out during the workload as well as any new application routes takes longer to be reachable
4. ovnkube-master cpu utilization increases to 6 cores
The dataplane test fails during connectivity check because the routes are unreachable after kube-burner finishes creating them. Turns out NBDB is still having load_balancers being added about 13 mins after the services were created. addlogicalports for pods are also exceeding 1s
it should be available within SLA and this workload should not affect other cluster routes
script used to reproduce - https://github.com/cloud-bulldozer/e2e-benchmarking/blob/master/workloads/router-perf-v2/ingress-performance.sh
*** Bug 2078758 has been marked as a duplicate of this bug. ***
Verified on 4.11.0-0.nightly-2022-06-15-222801
- ran workload described here - 2000 pods/routes of mixed termination types
- test ran successfully - no timeout failures
- console and other routed applications remained responsive throughout the test.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.