Bug 2080058 - ClusterVersion: could not download the update
Summary: ClusterVersion: could not download the update
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 4.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.z
Assignee: W. Trevor King
QA Contact: Evgeni Vakhonin
URL:
Whiteboard:
Depends On: 2070805
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-04-28 21:16 UTC by W. Trevor King
Modified: 2022-07-26 10:12 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 2070805
Environment:
Last Closed: 2022-05-18 11:51:03 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-version-operator pull 769 0 None open Bug 2080058: pkg/cvo/updatepayload: Prune previous payload downloads 2022-04-28 21:16:45 UTC
Red Hat Knowledge Base (Solution) 6965075 0 None None None 2022-07-26 10:12:49 UTC
Red Hat Product Errata RHBA-2022:2178 0 None None None 2022-05-18 11:51:17 UTC

Description W. Trevor King 2022-04-28 21:16:08 UTC
+++ This bug was initially created as a clone of Bug #2070805 +++

Description of problem:

ClusterID: cc782851-976b-494c-90ea-d5125936e134
ClusterVersion: Updating to "4.10.5" from "4.10.4" for 2 hours: Unable to apply 4.10.5: could not download the update
ClusterOperators:
	All healthy and stable

Cluster trying to upgrade from 4.10.5 from 4.10.4 is stuck with with the above error reported on the clusterversion.

A pod in the openshift-cluster-version namespace keeps being created and error-ing.
We managed to grab a log which had:


oc logs -n openshift-cluster-version version-4.10.5-9jv69-4kxs7
mv: inter-device move failed: '/manifests' to '/etc/cvo/updatepayloads/HbO7IDc7tyIg9utw3sd_tg/manifests/manifests'; unable to remove target: Directory not empty


I will attach a must-gather and adm inspect of the openshift-cluster-version (although the adm inspect seemed to error grabbing the version-4.10.5 pod details) in a private comment.


This is a gcp cluster.

Comment 3 Evgeni Vakhonin 2022-05-10 21:28:30 UTC
verifying on 4.10.0-0.nightly-2022-05-10-060208 to 4.10.0-0.nightly-2022-05-10-131029

using the same method as in https://bugzilla.redhat.com/show_bug.cgi?id=2070805#c19

1) started upgrade
╰─ oc adm upgrade  --allow-explicit-upgrade --force --allow-upgrade-with-warnings --to-image registry.ci.openshift.org/ocp/release@sha256:33bf3c2f384ff1bbdc51878bf7d8d9d69bb5c8e061bb6df929a0adc8766a38b1 #new

2) reverted back
╰─ oc adm upgrade  --allow-explicit-upgrade --force --allow-upgrade-with-warnings --to-image registry.ci.openshift.org/ocp/release@sha256:79836236068c4d7e1400366435f541de54f17cefd2b00e021d89c380c6b084b0 #old

3) invalidated the current payload by deleting release-manifests
╰─ for node in $(oc get nodes -l 'node-role.kubernetes.io/master' -ojsonpath='{.items[:].metadata.name}'); do echo $node; oc debug node/$node -- /bin/bash -c 'rm -rf /host/etc/cvo/updatepayloads/*/release-manifests'; done 2>/dev/null

4) checked status, pods, and log
5) repeated

result:
after many cycles, no crashing version pod is observed
╰─ oc get pods -owide                  
NAME                                        READY   STATUS      RESTARTS   AGE   IP             NODE                                                   NOMINATED NODE   READINESS GATES
cluster-version-operator-84575dbf5d-kw2sx   1/1     Running     0          14m   10.0.0.3       evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--2tcdf-7424n                        0/1     Completed   0          19m   10.129.0.121   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--4cpjv-55zkm                        0/1     Completed   0          57m   10.129.0.80    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--5b9rs-pwrm8                        0/1     Completed   0          66m   10.129.0.66    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--7gdnx-kzv5b                        0/1     Completed   0          20m   10.129.0.115   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--88bt2-mpnmk                        0/1     Completed   0          48m   10.129.0.94    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--88g8j-wd66v                        0/1     Completed   0          19m   10.129.0.119   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--9f8wc-t5cbm                        0/1     Completed   0          13m   10.129.0.146   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--9tg8n-89v4t                        0/1     Completed   0          64m   10.129.0.70    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--9vqgf-p4chf                        0/1     Completed   0          67m   10.129.0.64    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--b56kx-dwlpj                        0/1     Completed   0          19m   10.129.0.118   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--b7frc-6kh7p                        0/1     Completed   0          14m   10.129.0.137   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--bz69p-t4pj4                        0/1     Completed   0          52m   10.129.0.87    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--cs2jv-jtqdr                        0/1     Completed   0          16m   10.129.0.131   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--cxxtj-wph4z                        0/1     Completed   0          20m   10.129.0.113   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--dg5bp-x9hmh                        0/1     Completed   0          38m   10.129.0.102   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--fhlwd-9mf7m                        0/1     Completed   0          68m   10.129.0.62    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--fpmcr-s58c8                        0/1     Completed   0          16m   10.129.0.132   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--ft2ls-s8wct                        0/1     Completed   0          64m   10.129.0.71    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--gznpm-4c76t                        0/1     Completed   0          14m   10.129.0.140   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--hv285-8vsw5                        0/1     Completed   0          54m   10.129.0.84    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--lbnt2-65nkh                        0/1     Completed   0          15m   10.129.0.136   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--lqsst-997m4                        0/1     Completed   0          18m   10.129.0.123   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--lt2rc-tldtn                        0/1     Completed   0          28m   10.129.0.110   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--m4mmk-5mbqw                        0/1     Completed   0          16m   10.129.0.130   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--m6p2r-27p4q                        0/1     Completed   0          56m   10.129.0.81    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--mmr9z-skv89                        0/1     Completed   0          17m   10.129.0.127   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--mtmf5-tw5g6                        0/1     Completed   0          62m   10.129.0.73    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--n7vn6-fd4jg                        0/1     Completed   0          28m   10.129.0.111   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--phrqz-wq9ct                        0/1     Completed   0          69m   10.128.0.78    evakhoni-2109-svjgd-master-2.c.openshift-qe.internal   <none>           <none>
version--qn58z-c9nn8                        0/1     Completed   0          51m   10.129.0.88    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--rvfxw-g29ph                        0/1     Completed   0          60m   10.129.0.76    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--rxq65-pjnng                        0/1     Completed   0          14m   10.129.0.143   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--tgdtt-dv9tg                        0/1     Completed   0          62m   10.129.0.72    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--w9pdh-cc7jg                        0/1     Completed   0          51m   10.129.0.89    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--wdvvp-r7mpn                        0/1     Completed   0          17m   10.129.0.128   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--wfsgp-fprsd                        0/1     Completed   0          48m   10.129.0.95    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--wwf9w-cq4sj                        0/1     Completed   0          38m   10.129.0.101   evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--x7htg-px249                        0/1     Completed   0          53m   10.129.0.86    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--xkqfv-v9f5w                        0/1     Completed   0          59m   10.129.0.77    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>
version--zsh85-j58l8                        0/1     Completed   0          54m   10.129.0.85    evakhoni-2109-svjgd-master-1.c.openshift-qe.internal   <none>           <none>


no messages in version pods log as expected
╰─ for pod in `oc get pods -n openshift-cluster-version -ojsonpath='{.items[1:].metadata.name}'`; do echo -n "${pod} logs: "; oc logs pod/$pod; echo; done 
version--2tcdf-7424n logs: 
version--4cpjv-55zkm logs: 
version--5b9rs-pwrm8 logs: 
version--7gdnx-kzv5b logs: 
version--88bt2-mpnmk logs: 
version--88g8j-wd66v logs: 
version--9f8wc-t5cbm logs: 
version--9tg8n-89v4t logs: 
version--9vqgf-p4chf logs: 
version--b56kx-dwlpj logs: 
...
...
...


no 'failed to prune update payload directory' in cvo logs, as expected

cvo status normal

no manifests/manifests directory, as expected
╰─ for node in $(oc get nodes -l 'node-role.kubernetes.io/master' -ojsonpath='{.items[:].metadata.name}');do echo -n "${node}: "; oc debug node/$node -- /bin/bash -c 'ls /host/etc/cvo/updatepayloads/*/manifests/manifests -lAR';done 2>/dev/null
evakhoni-2109-svjgd-master-0.c.openshift-qe.internal: ls: cannot access '/host/etc/cvo/updatepayloads/*/manifests/manifests': No such file or directory
evakhoni-2109-svjgd-master-1.c.openshift-qe.internal: ls: cannot access '/host/etc/cvo/updatepayloads/*/manifests/manifests': No such file or directory
evakhoni-2109-svjgd-master-2.c.openshift-qe.internal: ls: cannot access '/host/etc/cvo/updatepayloads/*/manifests/manifests': No such file or directory

Comment 4 Evgeni Vakhonin 2022-05-11 05:47:38 UTC
note: it is still sometimes possible to reproduce while upgrading from unfixed-to-fixed build, which is expected according to dev
left a note about it in the original bug bz2070805#c20
since fixed-to-fixed upgrade is verified above, moving this bug to Verified

Comment 7 errata-xmlrpc 2022-05-18 11:51:03 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.14 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:2178

Comment 8 W. Trevor King 2022-07-26 10:12:50 UTC
Skimming some of the earlier comments here, I see some mentions of --force.  That's a big hammer:

  $ oc adm upgrade --help | grep force
   The cluster may report that the upgrade should not be performed due to a content verification error or update precondition failures such as operators blocking upgrades. Do not upgrade to images that are not appropriately signed without understanding the risks of upgrading your cluster to untrusted code. If you must override this protection use the --force flag.
        --force=false: Forcefully upgrade the cluster even when upgrade release image validation fails and the cluster is reporting errors.

Sometimes you need that hammer, e.g. when verifying bugs by updating to unsigned CI release builds.  But for folks moving between signed releases, it's best to avoid --force if at all possible.  If you're being bit by this issue please see the notes and recommended recovery steps in [1].

[1]: https://access.redhat.com/solutions/6965075


Note You need to log in before you can comment on or make changes to this bug.