Hide Forgot
Description of problem: Attempting to validate 250 max-pods a node per OpenShift worker node is failing with many pods stuck in ContainerCreating. This seems to occur after several times roughly reaching 250 max-pods a node. The current cluster under test is a 13 node cluster hosted in ibmcloud. There are 3 control-plane nodes and 10 worker nodes. The 10 worker nodes are sub-divided into nodedensity and worker role nodes: # oc get no NAME STATUS ROLES AGE VERSION jetlag-bm10 Ready master 20h v1.22.1+6859754 jetlag-bm11 Ready master 21h v1.22.1+6859754 jetlag-bm12 Ready master 21h v1.22.1+6859754 jetlag-bm13 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm14 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm15 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm16 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm17 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm18 Ready worker 20h v1.22.1+6859754 jetlag-bm19 Ready worker 20h v1.22.1+6859754 jetlag-bm20 Ready worker 20h v1.22.1+6859754 jetlag-bm21 Ready worker 20h v1.22.1+6859754 jetlag-bm22 Ready worker 20h v1.22.1+6859754 The test places workload pods on the nodedensity role type nodes. The nodedensity nodes were cordoned, drained, then uncordoned prior to testing to "push" any steady state workloads off those 5 nodes such that they should run on the remaining 5 worker nodes and not interfere with the capacity of the nodedensity role-type workers. After reviewing capacity for the nodes, I set the pod/node count to 234 such that we would create 5 * 234 namespaces each with one deployment with one pod replica with one container per pod. This results in 1170 namespaces, 1170 deployments and 1170 pods and 1170 total workload containers running across the 5 "nodedensity" role nodes. 234pods/node leaves room for 10 pods across the 5 nodes under test and actually does not test to the exact limit of 250pods/node. The test repeat to run several iterations to see if the performance of loading up the nodes varies at all. The test iterations until failure are: 1. 6 x 1170 namespaces with 1 deployment, 1 service, 1 route, 1 pod replica per deployment, 1 workload container per pod replica, no probes, no secrets/configmaps 2. 3 x 1170 namespaces same as above but setting requests and limits to establish Guaranteed QOS pods. <--- Test failed/stuck on 3rd iteration Layout of stuck pods: # oc get po -A -o wide | grep ContainerCreating | awk '{print $8}' | sort | uniq -c 5 jetlag-bm10 4 jetlag-bm11 6 jetlag-bm12 235 jetlag-bm13 234 jetlag-bm14 234 jetlag-bm15 233 jetlag-bm16 234 jetlag-bm17 42 jetlag-bm18 38 jetlag-bm20 1 jetlag-bm21 # oc get po -A -o wide | grep boatload | grep ContainerCreating | awk '{print $8}' | sort | uniq -c 235 jetlag-bm13 234 jetlag-bm14 234 jetlag-bm15 233 jetlag-bm16 234 jetlag-bm17 # oc get po -A -o wide | grep boatload | grep ContainerCreating -c 1170 Every workload pod (pods with name prefix boatload) is stuck Pods are stuck on nodes not hosting the workload (Including master nodes) Snippet of stuck pods: # oc get po -A | grep boatload | head boatload-1 boatload-1-1-boatload-7fbbfdd9d4-dbdln 0/1 ContainerCreating 0 18h boatload-10 boatload-10-1-boatload-9b7c4f96-ntnr5 0/1 ContainerCreating 0 18h boatload-100 boatload-100-1-boatload-699b746448-nzg5w 0/1 ContainerCreating 0 18h boatload-1000 boatload-1000-1-boatload-6649fcdfbd-llndn 0/1 ContainerCreating 0 18h boatload-1001 boatload-1001-1-boatload-76cfbcdcb5-p4w6n 0/1 ContainerCreating 0 18h boatload-1002 boatload-1002-1-boatload-6d4fff77f8-cpvxk 0/1 ContainerCreating 0 18h boatload-1003 boatload-1003-1-boatload-5ff58b46c9-n66qk 0/1 ContainerCreating 0 18h boatload-1004 boatload-1004-1-boatload-6b7797978d-crjzx 0/1 ContainerCreating 0 18h boatload-1005 boatload-1005-1-boatload-844768f7b9-kxcxt 0/1 ContainerCreating 0 18h boatload-1006 boatload-1006-1-boatload-78dd5c8f54-2pt2h 0/1 ContainerCreating 0 18h Version-Release number of selected component (if applicable): 4.10.0-0.nightly-2021-12-12-232810 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: Description from failed pod: # oc describe po -n boatload-1 Name: boatload-1-1-boatload-7fbbfdd9d4-dbdln Namespace: boatload-1 Priority: 0 Node: jetlag-bm15/10.5.190.3 Start Time: Tue, 14 Dec 2021 15:52:02 -0600 Labels: app=boatload-1-1 pod-template-hash=7fbbfdd9d4 Annotations: openshift.io/scc: restricted Status: Pending IP: IPs: <none> Controlled By: ReplicaSet/boatload-1-1-boatload-7fbbfdd9d4 Containers: boatload-1: Container ID: Image: quay.io/redhat-performance/test-gohttp-probe:v0.0.2 Image ID: Port: 8000/TCP Host Port: 0/TCP State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Limits: cpu: 50m memory: 100Mi Requests: cpu: 50m memory: 100Mi Environment: PORT: 8000 LISTEN_DELAY_SECONDS: 0 LIVENESS_DELAY_SECONDS: 0 READINESS_DELAY_SECONDS: 0 RESPONSE_DELAY_MILLISECONDS: 0 LIVENESS_SUCCESS_MAX: 0 READINESS_SUCCESS_MAX: 0 Mounts: /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-9q6zv (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: kube-api-access-9q6zv: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Guaranteed Node-Selectors: jetlag=true Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ErrorAddingLogicalPort 73m (x17 over 18h) controlplane failed to ensure namespace locked: failed to create address set for namespace: boatload-1, error: failed to create new address set boatload-1_v4 (error in transact with ops [{Op:insert Table:Address_Set Row:map[external_ids:{GoMap:map[name:boatload-1_v4]} name:a9965174113379253624] Rows:[] Columns:[] Mutations:[] Timeout:0 Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]: context deadline exceeded: while awaiting reconnection) Warning FailedCreatePodSandBox 96s (x903 over 18h) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_boatload-1-1-boatload-7fbbfdd9d4-dbdln_boatload-1_6001f68b-5ae2-4615-8fd7-263152c548ed_0(a5bf992d2e41e0c3a3c6e2b478538819b3c14f1ab44dc55e6dc91a26d7bbe973): error adding pod boatload-1_boatload-1-1-boatload-7fbbfdd9d4-dbdln to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [boatload-1/boatload-1-1-boatload-7fbbfdd9d4-dbdln/6001f68b-5ae2-4615-8fd7-263152c548ed:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[boatload-1/boatload-1-1-boatload-7fbbfdd9d4-dbdln a5bf992d2e41e0c3a3c6e2b478538819b3c14f1ab44dc55e6dc91a26d7bbe973] [boatload-1/boatload-1-1-boatload-7fbbfdd9d4-dbdln a5bf992d2e41e0c3a3c6e2b478538819b3c14f1ab44dc55e6dc91a26d7bbe973] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded ' All nodes are in Ready State: # oc get no NAME STATUS ROLES AGE VERSION jetlag-bm10 Ready master 20h v1.22.1+6859754 jetlag-bm11 Ready master 21h v1.22.1+6859754 jetlag-bm12 Ready master 21h v1.22.1+6859754 jetlag-bm13 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm14 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm15 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm16 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm17 Ready nodedensity,worker 20h v1.22.1+6859754 jetlag-bm18 Ready worker 20h v1.22.1+6859754 jetlag-bm19 Ready worker 20h v1.22.1+6859754 jetlag-bm20 Ready worker 20h v1.22.1+6859754 jetlag-bm21 Ready worker 20h v1.22.1+6859754 jetlag-bm22 Ready worker 20h v1.22.1+6859754 I assume a must-gather will fail while the cluster is in the current state so I will capture logs, then attempt to clear the workload, then attempt a must-gather.
Looking at some of the ovn logs on a specific pod show addlogicalport timings much have became unreasonable: # cat ovnkube-master-5kz84.log | grep "addlogicalport" -i ... I1216 19:10:57.968847 1 pods.go:300] [boatload-1168/boatload-1168-1-boatload-59c794667-tsq5s] addLogicalPort took 119.267425ms I1216 19:10:58.285261 1 pods.go:300] [boatload-1169/boatload-1169-1-boatload-85f5db9bf4-mbdtc] addLogicalPort took 119.572973ms I1216 19:10:58.885310 1 pods.go:300] [boatload-1170/boatload-1170-1-boatload-848b99df44-7nspz] addLogicalPort took 119.824014ms I1216 19:11:15.915533 1 pods.go:300] [openshift-marketplace/redhat-marketplace-6lzvf] addLogicalPort took 121.279971ms I1216 19:14:14.439708 1 pods.go:300] [openshift-marketplace/redhat-operators-w5qzm] addLogicalPort took 118.814712ms I1216 19:14:14.449908 1 pods.go:300] [openshift-marketplace/certified-operators-9dhg9] addLogicalPort took 116.228607ms I1216 19:14:14.463422 1 pods.go:300] [openshift-marketplace/community-operators-qd59v] addLogicalPort took 117.597242ms I1216 19:15:00.263116 1 pods.go:300] [openshift-multus/ip-reconciler-27328035--1-ghxwz] addLogicalPort took 112.461108ms I1216 19:15:00.269914 1 pods.go:300] [openshift-operator-lifecycle-manager/collect-profiles-27328035--1-hgh5k] addLogicalPort took 115.551699ms I1216 19:41:36.790291 1 pods.go:300] [boatload-6/boatload-6-1-boatload-788dc74479-gw68m] addLogicalPort took 50.001351653s I1216 19:42:36.792850 1 pods.go:300] [boatload-6/boatload-6-1-boatload-788dc74479-gw68m] addLogicalPort took 1m0.002430439s I1216 19:42:56.793158 1 pods.go:300] [boatload-1/boatload-1-1-boatload-6c96b649bd-48bvl] addLogicalPort took 1m20.001660463s I1216 19:43:16.793452 1 pods.go:300] [boatload-4/boatload-4-1-boatload-5f5f864d8f-778z4] addLogicalPort took 1m40.002229834s I1216 19:43:36.794630 1 pods.go:300] [boatload-2/boatload-2-1-boatload-797b75b5c9-klz67] addLogicalPort took 2m0.00304807s I1216 19:43:56.796481 1 pods.go:300] [boatload-3/boatload-3-1-boatload-f4cf55cdb-475p8] addLogicalPort took 2m20.004503398s In this log we see addLogicalPort timigns from a previous test taking ~119ms, then the following test that results in every workload pod stuck in containercreating taking 1m or greater. After some time passes, looking at one specific pod: # oc get po -n boatload-34 NAME READY STATUS RESTARTS AGE boatload-34-1-boatload-7d58cbdfbb-bpvlr 0/1 ContainerCreating 0 6h33m I1217 01:44:17.761433 1 pods.go:300] [boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr] addLogicalPort took 9m30.017726293s We can see that this pod took 9m30s for the addlogicalport entry in the log Events part of the description of the pod: Events: Type Reason Age From Message ---- ------ ---- ---- ------- Warning ErrorAddingLogicalPort 9m19s (x3 over 4h12m) controlplane failed to ensure namespace locked: failed to create address set for namespace: boatload-34, error: failed to create new address set boatload-34_v4 (error in transact with ops [{Op:insert Table:Address_Set Row:map[external_ids:{GoMap:map[name:boatload-34_v4]} name:a1817380831482281572] Rows:[] Columns:[] Mutations:[] Timeout:0 Where:[] Until: Durable:<nil> Comment:<nil> Lock:<nil> UUIDName:}]: context deadline exceeded: while awaiting reconnection) Warning FailedCreatePodSandBox 4m10s (x309 over 6h22m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_boatload-34-1-boatload-7d58cbdfbb-bpvlr_boatload-34_8c1ba5ef-f0c8-43b0-a73a-3662eb9f8735_0(b101d034c4bb7ed13ad02c24e58d622dc8a67a729c790c251c19bfe344837266): error adding pod boatload-34_boatload-34-1-boatload-7d58cbdfbb-bpvlr to CNI network "multus-cni-network": plugin type="multus" name="multus-cni-network" failed (add): [boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr/8c1ba5ef-f0c8-43b0-a73a-3662eb9f8735:ovn-kubernetes]: error adding container to network "ovn-kubernetes": CNI request failed with status 400: '[boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr b101d034c4bb7ed13ad02c24e58d622dc8a67a729c790c251c19bfe344837266] [boatload-34/boatload-34-1-boatload-7d58cbdfbb-bpvlr b101d034c4bb7ed13ad02c24e58d622dc8a67a729c790c251c19bfe344837266] failed to get pod annotation: timed out waiting for annotations: context deadline exceeded Seems like there was context deadlines exceeded while waiting for connections. Later on in order to clear the workload I had to in serial reboot each master and I was able to eventually delete all workload pods, and other job like openshift pods were able to come and go too. (Not just the workload pods started ending up stuck in containercreating.
Marking this as blocker+ since without the PRs there is a bad regression with pod creation latency. @dcbw: hope you agree.
@
Verified on 4.10.0-0.nightly-2022-01-31-012936 on IBM cloud. 10 workers, 250 pods on 5 nodes. Running pods/node 236 rbrattai-o410i11-kvds9-worker-1-b66hx 231 rbrattai-o410i11-kvds9-worker-1-h6zdr 237 rbrattai-o410i11-kvds9-worker-1-j9js4 237 rbrattai-o410i11-kvds9-worker-2-6l5tt 236 rbrattai-o410i11-kvds9-worker-2-8ggpw largest addLogicalPort times node-density-287] addLogicalPort took 470.708987ms node-density-978] addLogicalPort took 493.649061ms node-density-785] addLogicalPort took 511.898523ms node-density-586] addLogicalPort took 513.501564ms node-density-284] addLogicalPort took 537.483542ms node-density-38] addLogicalPort took 548.649217ms node-density-976] addLogicalPort took 554.633343ms node-density-283] addLogicalPort took 577.463732ms node-density-148] addLogicalPort took 600.974141ms node-density-145] addLogicalPort took 619.917334ms node-density-37] addLogicalPort took 639.893066ms node-density-422] addLogicalPort took 686.451966ms node-density-424] addLogicalPort took 716.245933ms node-density-146] addLogicalPort took 727.488826ms node-density-36] addLogicalPort took 764.081905ms node-density-977] addLogicalPort took 769.694995ms node-density-423] addLogicalPort took 771.642659ms node-density-585] addLogicalPort took 785.058304ms On AWS 4.10.0-0.nightly-2022-01-31-012936 9 workers, 250 pods on 5 workers Running pods/node 236 rbrattai-o410i11-kvds9-worker-1-b66hx 231 rbrattai-o410i11-kvds9-worker-1-h6zdr 237 rbrattai-o410i11-kvds9-worker-1-j9js4 237 rbrattai-o410i11-kvds9-worker-2-6l5tt 236 rbrattai-o410i11-kvds9-worker-2-8ggpw largest addLogicalPort times node-density-514] addLogicalPort took 164.437205ms node-density-30] addLogicalPort took 166.760694ms node-density-833] addLogicalPort took 185.298644ms node-density-739] addLogicalPort took 189.04798ms node-density-831] addLogicalPort took 189.115347ms node-density-735] addLogicalPort took 218.693352ms node-density-738] addLogicalPort took 224.195572ms node-density-732] addLogicalPort took 276.709528ms node-density-734] addLogicalPort took 278.150532ms node-density-737] addLogicalPort took 316.868319ms node-density-736] addLogicalPort took 339.96229ms node-density-733] addLogicalPort took 352.281789ms
Correction On AWS 4.10.0-0.nightly-2022-01-31-012936 9 workers, 250 pods on 5 workers Running pods/node 237 ip-10-0-139-124.us-east-2.compute.internal 232 ip-10-0-142-79.us-east-2.compute.internal 236 ip-10-0-154-66.us-east-2.compute.internal 231 ip-10-0-166-221.us-east-2.compute.internal 237 ip-10-0-176-104.us-east-2.compute.internal
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056