Hide Forgot
+++ This bug was initially created as a clone of Bug #2070805 +++ Description of problem: ClusterID: cc782851-976b-494c-90ea-d5125936e134 ClusterVersion: Updating to "4.10.5" from "4.10.4" for 2 hours: Unable to apply 4.10.5: could not download the update ClusterOperators: All healthy and stable Cluster trying to upgrade from 4.10.5 from 4.10.4 is stuck with with the above error reported on the clusterversion. A pod in the openshift-cluster-version namespace keeps being created and error-ing. We managed to grab a log which had: oc logs -n openshift-cluster-version version-4.10.5-9jv69-4kxs7 mv: inter-device move failed: '/manifests' to '/etc/cvo/updatepayloads/HbO7IDc7tyIg9utw3sd_tg/manifests/manifests'; unable to remove target: Directory not empty I will attach a must-gather and adm inspect of the openshift-cluster-version (although the adm inspect seemed to error grabbing the version-4.10.5 pod details) in a private comment. This is a gcp cluster.
verifying on 4.10.0-0.nightly-2022-05-10-060208 to 4.10.0-0.nightly-2022-05-10-131029 using the same method as in https://bugzilla.redhat.com/show_bug.cgi?id=2070805#c19 1) started upgrade ╰─ oc adm upgrade --allow-explicit-upgrade --force --allow-upgrade-with-warnings --to-image registry.ci.openshift.org/ocp/release@sha256:33bf3c2f384ff1bbdc51878bf7d8d9d69bb5c8e061bb6df929a0adc8766a38b1 #new 2) reverted back ╰─ oc adm upgrade --allow-explicit-upgrade --force --allow-upgrade-with-warnings --to-image registry.ci.openshift.org/ocp/release@sha256:79836236068c4d7e1400366435f541de54f17cefd2b00e021d89c380c6b084b0 #old 3) invalidated the current payload by deleting release-manifests ╰─ for node in $(oc get nodes -l 'node-role.kubernetes.io/master' -ojsonpath='{.items[:].metadata.name}'); do echo $node; oc debug node/$node -- /bin/bash -c 'rm -rf /host/etc/cvo/updatepayloads/*/release-manifests'; done 2>/dev/null 4) checked status, pods, and log 5) repeated result: after many cycles, no crashing version pod is observed ╰─ oc get pods -owide NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES cluster-version-operator-84575dbf5d-kw2sx 1/1 Running 0 14m 10.0.0.3 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--2tcdf-7424n 0/1 Completed 0 19m 10.129.0.121 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--4cpjv-55zkm 0/1 Completed 0 57m 10.129.0.80 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--5b9rs-pwrm8 0/1 Completed 0 66m 10.129.0.66 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--7gdnx-kzv5b 0/1 Completed 0 20m 10.129.0.115 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--88bt2-mpnmk 0/1 Completed 0 48m 10.129.0.94 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--88g8j-wd66v 0/1 Completed 0 19m 10.129.0.119 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--9f8wc-t5cbm 0/1 Completed 0 13m 10.129.0.146 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--9tg8n-89v4t 0/1 Completed 0 64m 10.129.0.70 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--9vqgf-p4chf 0/1 Completed 0 67m 10.129.0.64 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--b56kx-dwlpj 0/1 Completed 0 19m 10.129.0.118 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--b7frc-6kh7p 0/1 Completed 0 14m 10.129.0.137 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--bz69p-t4pj4 0/1 Completed 0 52m 10.129.0.87 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--cs2jv-jtqdr 0/1 Completed 0 16m 10.129.0.131 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--cxxtj-wph4z 0/1 Completed 0 20m 10.129.0.113 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--dg5bp-x9hmh 0/1 Completed 0 38m 10.129.0.102 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--fhlwd-9mf7m 0/1 Completed 0 68m 10.129.0.62 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--fpmcr-s58c8 0/1 Completed 0 16m 10.129.0.132 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--ft2ls-s8wct 0/1 Completed 0 64m 10.129.0.71 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--gznpm-4c76t 0/1 Completed 0 14m 10.129.0.140 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--hv285-8vsw5 0/1 Completed 0 54m 10.129.0.84 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--lbnt2-65nkh 0/1 Completed 0 15m 10.129.0.136 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--lqsst-997m4 0/1 Completed 0 18m 10.129.0.123 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--lt2rc-tldtn 0/1 Completed 0 28m 10.129.0.110 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--m4mmk-5mbqw 0/1 Completed 0 16m 10.129.0.130 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--m6p2r-27p4q 0/1 Completed 0 56m 10.129.0.81 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--mmr9z-skv89 0/1 Completed 0 17m 10.129.0.127 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--mtmf5-tw5g6 0/1 Completed 0 62m 10.129.0.73 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--n7vn6-fd4jg 0/1 Completed 0 28m 10.129.0.111 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--phrqz-wq9ct 0/1 Completed 0 69m 10.128.0.78 evakhoni-2109-svjgd-master-2.c.openshift-qe.internal <none> <none> version--qn58z-c9nn8 0/1 Completed 0 51m 10.129.0.88 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--rvfxw-g29ph 0/1 Completed 0 60m 10.129.0.76 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--rxq65-pjnng 0/1 Completed 0 14m 10.129.0.143 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--tgdtt-dv9tg 0/1 Completed 0 62m 10.129.0.72 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--w9pdh-cc7jg 0/1 Completed 0 51m 10.129.0.89 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--wdvvp-r7mpn 0/1 Completed 0 17m 10.129.0.128 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--wfsgp-fprsd 0/1 Completed 0 48m 10.129.0.95 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--wwf9w-cq4sj 0/1 Completed 0 38m 10.129.0.101 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--x7htg-px249 0/1 Completed 0 53m 10.129.0.86 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--xkqfv-v9f5w 0/1 Completed 0 59m 10.129.0.77 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> version--zsh85-j58l8 0/1 Completed 0 54m 10.129.0.85 evakhoni-2109-svjgd-master-1.c.openshift-qe.internal <none> <none> no messages in version pods log as expected ╰─ for pod in `oc get pods -n openshift-cluster-version -ojsonpath='{.items[1:].metadata.name}'`; do echo -n "${pod} logs: "; oc logs pod/$pod; echo; done version--2tcdf-7424n logs: version--4cpjv-55zkm logs: version--5b9rs-pwrm8 logs: version--7gdnx-kzv5b logs: version--88bt2-mpnmk logs: version--88g8j-wd66v logs: version--9f8wc-t5cbm logs: version--9tg8n-89v4t logs: version--9vqgf-p4chf logs: version--b56kx-dwlpj logs: ... ... ... no 'failed to prune update payload directory' in cvo logs, as expected cvo status normal no manifests/manifests directory, as expected ╰─ for node in $(oc get nodes -l 'node-role.kubernetes.io/master' -ojsonpath='{.items[:].metadata.name}');do echo -n "${node}: "; oc debug node/$node -- /bin/bash -c 'ls /host/etc/cvo/updatepayloads/*/manifests/manifests -lAR';done 2>/dev/null evakhoni-2109-svjgd-master-0.c.openshift-qe.internal: ls: cannot access '/host/etc/cvo/updatepayloads/*/manifests/manifests': No such file or directory evakhoni-2109-svjgd-master-1.c.openshift-qe.internal: ls: cannot access '/host/etc/cvo/updatepayloads/*/manifests/manifests': No such file or directory evakhoni-2109-svjgd-master-2.c.openshift-qe.internal: ls: cannot access '/host/etc/cvo/updatepayloads/*/manifests/manifests': No such file or directory
note: it is still sometimes possible to reproduce while upgrading from unfixed-to-fixed build, which is expected according to dev left a note about it in the original bug bz2070805#c20 since fixed-to-fixed upgrade is verified above, moving this bug to Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 4.10.14 bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:2178
Skimming some of the earlier comments here, I see some mentions of --force. That's a big hammer: $ oc adm upgrade --help | grep force The cluster may report that the upgrade should not be performed due to a content verification error or update precondition failures such as operators blocking upgrades. Do not upgrade to images that are not appropriately signed without understanding the risks of upgrading your cluster to untrusted code. If you must override this protection use the --force flag. --force=false: Forcefully upgrade the cluster even when upgrade release image validation fails and the cluster is reporting errors. Sometimes you need that hammer, e.g. when verifying bugs by updating to unsigned CI release builds. But for folks moving between signed releases, it's best to avoid --force if at all possible. If you're being bit by this issue please see the notes and recommended recovery steps in [1]. [1]: https://access.redhat.com/solutions/6965075