Bug 2032998
| Summary: | Can not achieve 250 pods/node with OVNKubernetes in a multiple worker node cluster | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alex Krzos <akrzos> |
| Component: | Networking | Assignee: | Surya Seetharaman <surya> |
| Networking sub component: | ovn-kubernetes | QA Contact: | Ross Brattain <rbrattai> |
| Status: | CLOSED ERRATA | Docs Contact: | |
| Severity: | urgent | ||
| Priority: | urgent | CC: | anusaxen, dcbw, rbrattai, surya |
| Version: | 4.10 | ||
| Target Milestone: | --- | ||
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:34:06 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Looking at some of the ovn logs on a specific pod show addlogicalport timings much have became unreasonable:
# cat ovnkube-master-5kz84.log | grep "addlogicalport" -i
...
I1216 19:10:57.968847 1 pods.go:300] [boatload-1168/boatload-1168-1-boatload-59c794667-tsq5s] addLogicalPort took 119.267425ms
I1216 19:10:58.285261 1 pods.go:300] [boatload-1169/boatload-1169-1-boatload-85f5db9bf4-mbdtc] addLogicalPort took 119.572973ms
I1216 19:10:58.885310 1 pods.go:300] [boatload-1170/boatload-1170-1-boatload-848b99df44-7nspz] addLogicalPort took 119.824014ms
I1216 19:11:15.915533 1 pods.go:300] [openshift-marketplace/redhat-marketplace-6lzvf] addLogicalPort took 121.279971ms
I1216 19:14:14.439708 1 pods.go:300] [openshift-marketplace/redhat-operators-w5qzm] addLogicalPort took 118.814712ms
I1216 19:14:14.449908 1 pods.go:300] [openshift-marketplace/certified-operators-9dhg9] addLogicalPort took 116.228607ms
I1216 19:14:14.463422 1 pods.go:300] [openshift-marketplace/community-operators-qd59v] addLogicalPort took 117.597242ms
I1216 19:15:00.263116 1 pods.go:300] [openshift-multus/ip-reconciler-27328035--1-ghxwz] addLogicalPort took 112.461108ms
I1216 19:15:00.269914 1 pods.go:300] [openshift-operator-lifecycle-manager/collect-profiles-27328035--1-hgh5k] addLogicalPort took 115.551699ms
I1216 19:41:36.790291 1 pods.go:300] [boatload-6/boatload-6-1-boatload-788dc74479-gw68m] addLogicalPort took 50.001351653s
I1216 19:42:36.792850 1 pods.go:300] [boatload-6/boatload-6-1-boatload-788dc74479-gw68m] addLogicalPort took 1m0.002430439s
I1216 19:42:56.793158 1 pods.go:300] [boatload-1/boatload-1-1-boatload-6c96b649bd-48bvl] addLogicalPort took 1m20.001660463s
I1216 19:43:16.793452 1 pods.go:300] [boatload-4/boatload-4-1-boatload-5f5f864d8f-778z4] addLogicalPort took 1m40.002229834s
I1216 19:43:36.794630 1 pods.go:300] [boatload-2/boatload-2-1-boatload-797b75b5c9-klz67] addLogicalPort took 2m0.00304807s
I1216 19:43:56.796481 1 pods.go:300] [boatload-3/boatload-3-1-boatload-f4cf55cdb-475p8] addLogicalPort took 2m20.004503398s
In this log we see addLogicalPort timigns from a previous test taking ~119ms, then the following test that results in every workload pod stuck in containercreating taking 1m or greater.
After some time passes, looking at one specific pod:
# oc get po -n boatload-34
NAME READY STATUS RESTARTS AGE
boatload-34-1-boatload-7d58cbdfbb-bpvlr 0/1 ContainerCreating 0 6h33m
I1217 01:44:17.761433 1 pods.go:300] [boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr] addLogicalPort took 9m30.017726293s
We can see that this pod took 9m30s for the addlogicalport entry in the log
Events part of the description of the pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning ErrorAddingLogicalPort 9m19s (x3 over 4h12m) controlplane failed to ensure namespace locked: failed to create address set for namespace: boatload-34, error: failed to create new address set boatload-34_v4 (error in transact with ops [{Op:insert Table:Address_Set Row:map[external_ids:{GoMap:map[name:boatload-34_v4]} name:a1817380831482281572] Rows:[] Columns:[] Mutations:[] Timeout:0 Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]: context deadline exceeded: while awaiting reconnection)
Warning FailedCreatePodSandBox 4m10s (x309 over 6h22m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_boatload-34-1-boatload-7d58cbdfbb-bpvlr_boatload-34_8c1ba5ef-f0c8-43b0-a73a-3662eb9f8735_0(b101d034c4bb7ed13ad02c24e58d622dc8a67a729c790c251c19bfe344837266): error adding pod boatload-34_boatload-34-1-boatload-7d58cbdfbb-bpvlr to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr/8c1ba5ef-f0c8-43b0-a73a-3662eb9f8735:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr b101d034c4bb7ed13ad02c24e58d622dc8a67a729c790c251c19bfe344837266] [boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr b101d034c4bb7ed13ad02c24e58d622dc8a67a729c790c251c19bfe344837266] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded
Seems like there was context deadlines exceeded while waiting for connections.
Later on in order to clear the workload I had to in serial reboot each master and I was able to eventually delete all workload pods, and other job like openshift pods were able to come and go too. (Not just the workload pods started ending up stuck in containercreating.
Marking this as blocker+ since without the PRs there is a bad regression with pod creation latency. @dcbw: hope you agree. @ Verified on 4.10.0-0.nightly-2022-01-31-012936 on IBM cloud. 10 workers, 250 pods on 5 nodes.
Running pods/node
236 rbrattai-o410i11-kvds9-worker-1-b66hx
231 rbrattai-o410i11-kvds9-worker-1-h6zdr
237 rbrattai-o410i11-kvds9-worker-1-j9js4
237 rbrattai-o410i11-kvds9-worker-2-6l5tt
236 rbrattai-o410i11-kvds9-worker-2-8ggpw
largest addLogicalPort times
node-density-287] addLogicalPort took 470.708987ms
node-density-978] addLogicalPort took 493.649061ms
node-density-785] addLogicalPort took 511.898523ms
node-density-586] addLogicalPort took 513.501564ms
node-density-284] addLogicalPort took 537.483542ms
node-density-38] addLogicalPort took 548.649217ms
node-density-976] addLogicalPort took 554.633343ms
node-density-283] addLogicalPort took 577.463732ms
node-density-148] addLogicalPort took 600.974141ms
node-density-145] addLogicalPort took 619.917334ms
node-density-37] addLogicalPort took 639.893066ms
node-density-422] addLogicalPort took 686.451966ms
node-density-424] addLogicalPort took 716.245933ms
node-density-146] addLogicalPort took 727.488826ms
node-density-36] addLogicalPort took 764.081905ms
node-density-977] addLogicalPort took 769.694995ms
node-density-423] addLogicalPort took 771.642659ms
node-density-585] addLogicalPort took 785.058304ms
On AWS 4.10.0-0.nightly-2022-01-31-012936 9 workers, 250 pods on 5 workers
Running pods/node
236 rbrattai-o410i11-kvds9-worker-1-b66hx
231 rbrattai-o410i11-kvds9-worker-1-h6zdr
237 rbrattai-o410i11-kvds9-worker-1-j9js4
237 rbrattai-o410i11-kvds9-worker-2-6l5tt
236 rbrattai-o410i11-kvds9-worker-2-8ggpw
largest addLogicalPort times
node-density-514] addLogicalPort took 164.437205ms
node-density-30] addLogicalPort took 166.760694ms
node-density-833] addLogicalPort took 185.298644ms
node-density-739] addLogicalPort took 189.04798ms
node-density-831] addLogicalPort took 189.115347ms
node-density-735] addLogicalPort took 218.693352ms
node-density-738] addLogicalPort took 224.195572ms
node-density-732] addLogicalPort took 276.709528ms
node-density-734] addLogicalPort took 278.150532ms
node-density-737] addLogicalPort took 316.868319ms
node-density-736] addLogicalPort took 339.96229ms
node-density-733] addLogicalPort took 352.281789ms
Correction
On AWS 4.10.0-0.nightly-2022-01-31-012936 9 workers, 250 pods on 5 workers
Running pods/node
237 ip-10-0-139-124.us-east-2.compute.internal
232 ip-10-0-142-79.us-east-2.compute.internal
236 ip-10-0-154-66.us-east-2.compute.internal
231 ip-10-0-166-221.us-east-2.compute.internal
237 ip-10-0-176-104.us-east-2.compute.internal
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 |
Description of problem: Attempting to validate 250 max-pods a node per OpenShift worker node is failing with many pods stuck in ContainerCreating. This seems to occur after several times roughly reaching 250 max-pods a node. The current cluster under test is a 13 node cluster hosted in ibmcloud. There are 3 control-plane nodes and 10 worker nodes. The 10 worker nodes are sub-divided into nodedensity and worker role nodes: # oc get no NAME STATUS ROLES AGE VERSION jetlag-bm10 Ready master 20h v1.22.1+6859754 jetlag-bm11 Ready master 21h v1.22.1+6859754 jetlag-bm12 Ready master 21h v1.22.1+6859754 jetlag-bm13 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm14 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm15 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm16 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm17 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm18 Ready worker 20h v1.22.1+6859754 jetlag-bm19 Ready worker 20h v1.22.1+6859754 jetlag-bm20 Ready worker 20h v1.22.1+6859754 jetlag-bm21 Ready worker 20h v1.22.1+6859754 jetlag-bm22 Ready worker 20h v1.22.1+6859754 The test places workload pods on the nodedensity role type nodes. The nodedensity nodes were cordoned, drained, then uncordoned prior to testing to "push" any steady state workloads off those 5 nodes such that they should run on the remaining 5 worker nodes and not interfere with the capacity of the nodedensity role-type workers. After reviewing capacity for the nodes, I set the pod/node count to 234 such that we would create 5 * 234 namespaces each with one deployment with one pod replica with one container per pod. This results in 1170 namespaces, 1170 deployments and 1170 pods and 1170 total workload containers running across the 5 "nodedensity" role nodes. 234pods/node leaves room for 10 pods across the 5 nodes under test and actually does not test to the exact limit of 250pods/node. The test repeat to run several iterations to see if the performance of loading up the nodes varies at all. The test iterations until failure are: 1. 6 x 1170 namespaces with 1 deployment, 1 service, 1 route, 1 pod replica per deployment, 1 workload container per pod replica, no probes, no secrets/configmaps 2. 3 x 1170 namespaces same as above but setting requests and limits to establish Guaranteed QOS pods. <--- Test failed/stuck on 3rd iteration Layout of stuck pods: # oc get po -A -o wide | grep ContainerCreating | awk '{print $8}' | sort | uniq -c 5 jetlag-bm10 4 jetlag-bm11 6 jetlag-bm12 235 jetlag-bm13 234 jetlag-bm14 234 jetlag-bm15 233 jetlag-bm16 234 jetlag-bm17 42 jetlag-bm18 38 jetlag-bm20 1 jetlag-bm21 # oc get po -A -o wide | grep boatload | grep ContainerCreating | awk '{print $8}' | sort | uniq -c 235 jetlag-bm13 234 jetlag-bm14 234 jetlag-bm15 233 jetlag-bm16 234 jetlag-bm17 # oc get po -A -o wide | grep boatload | grep ContainerCreating -c 1170 Every workload pod (pods with name prefix boatload) is stuck Pods are stuck on nodes not hosting the workload (Including master nodes) Snippet of stuck pods: # oc get po -A | grep boatload | head boatload-1 boatload-1-1-boatload-7fbbfdd9d4-dbdln 0/1 ContainerCreating 0 18h boatload-10 boatload-10-1-boatload-9b7c4f96-ntnr5 0/1 ContainerCreating 0 18h boatload-100 boatload-100-1-boatload-699b746448-nzg5w 0/1 ContainerCreating 0 18h boatload-1000 boatload-1000-1-boatload-6649fcdfbd-llndn 0/1 ContainerCreating 0 18h boatload-1001 boatload-1001-1-boatload-76cfbcdcb5-p4w6n 0/1 ContainerCreating 0 18h boatload-1002 boatload-1002-1-boatload-6d4fff77f8-cpvxk 0/1 ContainerCreating 0 18h boatload-1003 boatload-1003-1-boatload-5ff58b46c9-n66qk 0/1 ContainerCreating 0 18h boatload-1004 boatload-1004-1-boatload-6b7797978d-crjzx 0/1 ContainerCreating 0 18h boatload-1005 boatload-1005-1-boatload-844768f7b9-kxcxt 0/1 ContainerCreating 0 18h boatload-1006 boatload-1006-1-boatload-78dd5c8f54-2pt2h 0/1 ContainerCreating 0 18h Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-12-232810 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Description from failed pod: # oc describe po -n boatload-1 Name: boatload-1-1-boatload-7fbbfdd9d4-dbdln Namespace: boatload-1 Priority: 0 Node: jetlag-bm15/10.5.190.3 Start Time: Tue, 14 Dec 2021 15:52:02 -0600 Labels: app=boatload-1-1 pod-template-hash=7fbbfdd9d4 Annotations: openshift.io/scc: restricted Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/boatload-1-1-boatload-7fbbfdd9d4 Containers: boatload-1: Container ID: Image: quay.io/redhat-performance/test-gohttp-probe:v0.0.2 Image ID: Port: 8000/TCP Host Port: 0/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 50m memory: 100Mi Requests: cpu: 50m memory: 100Mi Environment: PORT: 8000 LISTEN_DELAY_SECONDS: 0 LIVENESS_DELAY_SECONDS: 0 READINESS_DELAY_SECONDS: 0 RESPONSE_DELAY_MILLISECONDS: 0 LIVENESS_SUCCESS_MAX: 0 READINESS_SUCCESS_MAX: 0 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9q6zv (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-9q6zv: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Guaranteed Node-Selectors: jetlag=true Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ErrorAddingLogicalPort 73m (x17 over 18h) controlplane failed to ensure namespace locked: failed to create address set for namespace: boatload-1, error: failed to create new address set boatload-1_v4 (error in transact with ops [{Op:insert Table:Address_Set Row:map[external_ids:{GoMap:map[name:boatload-1_v4]} name:a9965174113379253624] Rows:[] Columns:[] Mutations:[] Timeout:0 Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]: context deadline exceeded: while awaiting reconnection) Warning FailedCreatePodSandBox 96s (x903 over 18h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_boatload-1-1-boatload-7fbbfdd9d4-dbdln_boatload-1_6001f68b-5ae2-4615-8fd7-263152c548ed_0(a5bf992d2e41e0c3a3c6e2b478538819b3c14f1ab44dc55e6dc91a26d7bbe973): error adding pod boatload-1_boatload-1-1-boatload-7fbbfdd9d4-dbdln to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [boatload-1/boatload-1-1-boatload-7fbbfdd9d4-dbdln/6001f68b-5ae2-4615-8fd7-263152c548ed:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[boatload-1/boatload-1-1-boatload-7fbbfdd9d4-dbdln a5bf992d2e41e0c3a3c6e2b478538819b3c14f1ab44dc55e6dc91a26d7bbe973] [boatload-1/boatload-1-1-boatload-7fbbfdd9d4-dbdln a5bf992d2e41e0c3a3c6e2b478538819b3c14f1ab44dc55e6dc91a26d7bbe973] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' All nodes are in Ready State: # oc get no NAME STATUS ROLES AGE VERSION jetlag-bm10 Ready master 20h v1.22.1+6859754 jetlag-bm11 Ready master 21h v1.22.1+6859754 jetlag-bm12 Ready master 21h v1.22.1+6859754 jetlag-bm13 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm14 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm15 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm16 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm17 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm18 Ready worker 20h v1.22.1+6859754 jetlag-bm19 Ready worker 20h v1.22.1+6859754 jetlag-bm20 Ready worker 20h v1.22.1+6859754 jetlag-bm21 Ready worker 20h v1.22.1+6859754 jetlag-bm22 Ready worker 20h v1.22.1+6859754 I assume a must-gather will fail while the cluster is in the current state so I will capture logs, then attempt to clear the workload, then attempt a must-gather.