Description of problem: pods were crash looping with: container_linux.go:348: starting container process caused "exec: \"/bin/bash\": stat /bin/bash: no such file or directory" container_linux.go:348: starting container process caused "exec: \"/usr/bin/cluster-network-operator\": stat /usr/bin/cluster-network-operator: no such file or directory" Looking at the image, it appears corrupt: # podman image inspect quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c877a9f6507be9f8979f3cc96e25c3b1b9c59e861c757fa20c57bcdc7bd99af4 Error: error parsing image data "5f996790a8d8380d4b3c47f8b19febd4f8c8c0317f47beab9364743889e5e307": readlink /var/lib/containers/storage/overlay: invalid argument Deleted and it re-pulled fine and the containers came up: [root@master-03 ~]# podman image rm quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c877a9f6507b e9f8979f3cc96e25c3b1b9c59e861c757fa20c57bcdc7bd99af4 Untagged: quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:c877a9f6507be9f8979f3cc96e25c3b1b9c59e861c757fa20c57bcdc7bd99af4 Deleted: 5f996790a8d8380d4b3c47f8b19febd4f8c8c0317f47beab9364743889e5e307 Version-Release number of selected component (if applicable): RHEL 8.2 / OCP 4.5.31 How reproducible: Once Steps to Reproduce: 1. Unknown 2. 3. Actual results: Crashloop backoff from the pod with little that points to this being a corrupted image issue. Expected results: Additional info:
*** Bug 1950536 has been marked as a duplicate of this bug. ***
We have a fix incoming for this in 4.8 (attached) but it will require some soak time and testing to make sure it doesn't break things (it already has broken some things in 4.8) before we backport
*** Bug 1918126 has been marked as a duplicate of this bug. ***
Followed reproducer steps from https://bugzilla.redhat.com/show_bug.cgi?id=1921128#c25 by hard rebooting all nodes couple of times. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-04-21-231018 True False 3h21m Cluster version is 4.8.0-0.nightly-2021-04-21-231018 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-215.us-east-2.compute.internal Ready worker 3h41m v1.21.0-rc.0+3ced7a9 ip-10-0-152-86.us-east-2.compute.internal Ready master 3h49m v1.21.0-rc.0+3ced7a9 ip-10-0-178-57.us-east-2.compute.internal Ready worker 3h42m v1.21.0-rc.0+3ced7a9 ip-10-0-184-90.us-east-2.compute.internal Ready master 3h49m v1.21.0-rc.0+3ced7a9 ip-10-0-214-243.us-east-2.compute.internal Ready master 3h49m v1.21.0-rc.0+3ced7a9 ip-10-0-221-20.us-east-2.compute.internal Ready worker 3h41m v1.21.0-rc.0+3ced7a9 $ oc debug node/ip-10-0-152-86.us-east-2.compute.internal Starting pod/ip-10-0-152-86us-east-2computeinternal-debug ... ... sh-4.4# journalctl | grep -i "Error: readlink" sh-4.4#
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438