Description of problem: oc get machine -n openshift-machine-api build01-9hdwj-worker-us-east-1b-m5d4x-w4fp2 -o wide NAME PHASE TYPE REGION ZONE AGE NODE PROVIDERID STATE build01-9hdwj-worker-us-east-1b-m5d4x-w4fp2 Running m5d.4xlarge us-east-1 us-east-1b 15d ip-10-0-146-117.ec2.internal aws:///us-east-1b/i-0890eb78de6644a83 running oc get node ip-10-0-146-117.ec2.internal -o wide NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME ip-10-0-146-117.ec2.internal Ready worker 15d v1.17.1 10.0.146.117 <none> Red Hat Enterprise Linux CoreOS 44.81.202004260825-0 (Ootpa) 4.18.0-147.8.1.el8_1.x86_64 cri-o://1.17.4-8.dev.rhaos4.4.git5f5c5e4.el8 This is m5d.4xlarge worker node from CI build cluster. oc get clusterversions.config.openshift.io NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0 True False 9h Cluster version is 4.4.0 We have several pods on this node with this error (Error: context deadline exceeded) in the pod description. Sometimes, retries worked out: the pod is eventually up and running. I would like to make sure it is expected hehavior from kubelet and crio, instead of bugs. I will attach more files later.
AFAICT this is expected. This is kubelet and crio saying "we are taking a long time to create pods/containers!". If the pods eventually reconcile and become ready, then this is okay. If they don't, the node may be overcommitted.
I think this was fixed with https://github.com/openshift/release/pull/8715?
I believe this is obsolete, CI isn't using this configuration anymore. Please reopen if that's not correct.