Bug 1888847 - non-terminating pods are going from running to pending
Summary: non-terminating pods are going from running to pending
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.z
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 1887501 (view as bug list)
Depends On: 1888041 1960291
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-15 22:26 UTC by OpenShift BugZilla Robot
Modified: 2021-05-13 14:36 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-09 15:50:59 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 412 0 None closed [release-4.6] Bug 1888847: UPSTREAM: 95561: kubelet container status calculation doesn't handle suddenly missing data pr... 2021-01-05 17:31:15 UTC
Red Hat Product Errata RHBA-2020:4339 0 None None None 2020-11-09 15:51:22 UTC

Description OpenShift BugZilla Robot 2020-10-15 22:26:57 UTC
+++ This bug was initially created as a clone of Bug #1888041 +++

terminated pods are not going from running to pending anymore.
static pods with the same UID are going from running to pending and we're hopeful the UID workaround will help with that.

We are now seeing another class of running to pending failure.  Some non-terminated pods are failing.  If you need help by creating a separate test and failing specifically on these, I can show you where to do that to spot them.

14 pods illegally transitioned to Pending

ns/openshift-cluster-node-tuning-operator pod/tuned-fvppk node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-image-registry pod/node-ca-8stv8 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-sdn pod/sdn-jqr2l node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-dns pod/dns-default-z5pj9 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-multus pod/multus-jkrv4 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-multus pod/network-metrics-daemon-t5kx5 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-sdn pod/ovs-h9n6p node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-machine-config-operator pod/machine-config-daemon-6fkpx node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/openshift-monitoring pod/node-exporter-b6m9s node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/e2e-statefulset-144 pod/ss-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/e2e-projected-8536 pod/pod-projected-secrets-2618d264-db5d-4371-9c4b-761e9618305a node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/e2e-csi-mock-volumes-4609-8746 pod/csi-mockplugin-resizer-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/e2e-csi-mock-volumes-4609-8746 pod/csi-mockplugin-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending
ns/e2e-csi-mock-volumes-4609-8746 pod/csi-mockplugin-attacher-0 node/ci-op-ym39gmpg-9c4c5-v4td6-worker-westus-fzjx9 - pod moved back to Pending


from https://prow.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-azure-4.6/1314781668476719104

--- Additional comment from rphillips@redhat.com on 2020-10-14 16:16:51 UTC ---

Node team had a slack conversation with David, and he suggested this patch: https://github.com/kubernetes/kubernetes/pull/95561

We will go test this. If the patch works, then we will target a 4.6.z backport.

The issue stems from reboots and the way crio wipes the container statuses upon reboot.

Comment 1 Ryan Phillips 2020-10-19 13:20:26 UTC
Backport PR: https://github.com/openshift/kubernetes/pull/412

Comment 2 Seth Jennings 2020-10-19 19:01:07 UTC
*** Bug 1887501 has been marked as a duplicate of this bug. ***

Comment 3 Ryan Phillips 2020-10-26 14:58:26 UTC
*** Bug 1887501 has been marked as a duplicate of this bug. ***

Comment 8 errata-xmlrpc 2020-11-09 15:50:59 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4339


Note You need to log in before you can comment on or make changes to this bug.