terminated pods are not going from running to pending anymore. static pods with the same UID are going from running to pending and we're hopeful the UID workaround will help with that. We are now seeing another class of running to pending failure. Some non-terminated pods are failing. If you need help by creating a separate test and failing specifically on these, I can show you where to do that to spot them. 14 pods illegally transitioned to Pending ns/openshift-cluster-node-tuning-operator pod/tuned-fvppk node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-image-registry pod/node-ca-8stv8 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-sdn pod/sdn-jqr2l node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-dns pod/dns-default-z5pj9 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-multus pod/multus-jkrv4 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-multus pod/network-metrics-daemon-t5kx5 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-sdn pod/ovs-h9n6p node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-machine-config-operator pod/machine-config-daemon-6fkpx node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/openshift-monitoring pod/node-exporter-b6m9s node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/e2e-statefulset-144 pod/ss-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/e2e-projected-8536 pod/pod-projected-secrets-2618d264-db5d-4371-9c4b-761e9618305a node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/e2e-csi-mock-volumes-4609-8746 pod/csi-mockplugin-resizer-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/e2e-csi-mock-volumes-4609-8746 pod/csi-mockplugin-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending ns/e2e-csi-mock-volumes-4609-8746 pod/csi-mockplugin-attacher-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending from https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/1314781668476719104
Node team had a slack conversation with David, and he suggested this patch: https://github.com/kubernetes/kubernetes/pull/95561 We will go test this. If the patch works, then we will target a 4.6.z backport. The issue stems from reboots and the way crio wipes the container statuses upon reboot.
The fix was in master when https://bugzilla.redhat.com/show_bug.cgi?id=1884035#c15 was verified and the same check applies to both. There were three fixes involved and the test verified them together.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633