Bug 1959352
Summary: | [scale] failed to get pod annotation: timed out waiting for annotations | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Joe Talerico <jtaleric> | |
Component: | Networking | Assignee: | Tim Rozet <trozet> | |
Networking sub component: | ovn-kubernetes | QA Contact: | Mike Fiedler <mifiedle> | |
Status: | CLOSED ERRATA | Docs Contact: | ||
Severity: | medium | |||
Priority: | high | CC: | adubey, anbhat, andbartl, anusaxen, arghosh, ashsharm, astoycos, bbennett, bmuchiny, cdc, chrzhang, dblack, dcbw, dpateriy, evadla, fcristin, jlema, mifiedle, moddi, msheth, openshift-bugs-escalate, smalleni, suc, swasthan, trozet, zzhao | |
Version: | 4.7 | Flags: | trozet:
needinfo-
|
|
Target Milestone: | --- | |||
Target Release: | 4.10.0 | |||
Hardware: | All | |||
OS: | All | |||
Whiteboard: | perfscale-ovn | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1997072 2034645 2076201 (view as bug list) | Environment: | ||
Last Closed: | 2022-03-10 16:03:38 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1997072, 2005985, 2076201 |
Description
Joe Talerico
2021-05-11 10:55:30 UTC
We are getting the same timeouts in two different baremetal perf testing environments: 1. 120 node, OCP 4.8.0-fc.9 Speciallly with node-density-light and high pod/worker tests. From a 120 node and 250 pods/worker node-density-light testing, taking a sample of 100 pods we see 202 events of this type: $ for i in {14000..14100}; do oc describe po -n node-density-f66eaffd-833b-4270-b7f4-9e5c6ffcf126 node-density-$i | grep "waiting for annotations"; done | wc -l 202 This is probably causing/related to considerably larger pod latency creation times. Env: - OCP 4.8.0-fc.9 - local gateway - ovn2.13-20.12.0-25.el8fdp.x86_64 2. 10 node, OCP 4.7.20 In this case we only see the timeouts errors within a ICNI2 setup, in the served/app pods. For the 25 pods/worker test, we see 40 timeouts events across all pods in the cluster. For the 50 pods/worker test, we observe 298 timeout events. For the 100 pods/worker, 1953 events. The same tests, without ICNI2 annotations, present 0 timeout events. The root cause of this issue is a case where ovnkube-master cannot keep up with how many pods are getting created, and the ovnkube-node CNI is waiting on ovnkube-master to annotate the pods. We currently have 15 pod handlers in ovnkube-master. Doing some scale testing with some mocks I see: 100msec delay, 5k pods 1 handler - 504 sec 15 handlers -15 sec 150 handlers - 6 sec 300msec delay, 5k pods 15 - handlers -117s 150- handlers -16s the "delay" above is the assumed amount of time addLogicalPort takes. It's usually around 100ms on a non-loaded cluster, around 300ms on a heavily loaded cluster. We can see a big drop when creating 5k pods at once. Running this with Joe on the scale lab node-density test we see no more of these errors, with the cost of memory increasing by about 130mb on the ovnkube-master. I can get this memory down some, by tweaking the handlers further. There was no significant increase in CPU usage over the run. Also, pod ready latency dropped from 30s down to 20s. Hi, Since @jtaleric , @trozet confirmed that the error can still be seen in some of their testing, I am moving this back to Assigned until some additional fixes are in place. Thanks, KK. To fully fix this issue where ovnkube->nbdb is a bottleneck we are going to need a series of more fixes. The problem is two fold: 1. ovnkube-master is too slow, so it takes too long to annotate pods during large scale at 20qps 2. nbdb is going to 100% cpu, causing ovnkube-master to slow down I'll add the PRs to the external trackers on this bug as I push them, and then we will need to get them all downstream in a PR to fix this issue. Further investigation has shown that with optimizations ovnkube-master and nbdb, that we can get to around 12k pods at creation rate of 20/sec before NBDB starts backing up and becoming the bottleneck. This is roughly halfway completing a 120 node density light test. Further optimizations to ovsdb-server for NBDB would be required in order to make further progress for this scenario. We have a bunch of key fixes related to this BZ that will help perf/scale. I think we should focus on getting those in with this bz and resolving this. Then we can open another bug related to poor performance of ovsdb-server. Will post the final PR to resolve this bz soon. *** Bug 2007009 has been marked as a duplicate of this bug. *** *** Bug 2005985 has been marked as a duplicate of this bug. *** *** Bug 1999704 has been marked as a duplicate of this bug. *** Verified on 4.10.0-0.nightly-2021-10-02-095441 via multiple runs of the PerfScale cluster density and node density tests at 120 node scale. No FailedCreatePodSandbox events with reason of pod annotation timeouts. There were some with OVS port binding timeouts. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days |