+++ This bug was initially created as a clone of Bug #1861971 +++ Description of problem: This issue just occurred on OCP 3.7(I verified that CAM upgrade on OCP 3.11 and it works), restic and velero pod always reported Init:ImagePullBackOff error as they don't take the latest images after applied the latest operator.yml on OCP 3.7. Version-Release number of selected component (if applicable): CAM 1.2.4 How reproducible: always Steps to Reproduce: 1. To test the CAM upgrade, we set up a Quay as a replacement for stage registry 2. Mirroing CAM 1.2.3 to the Quay 3. Install CAM 1.2.3 on OCP 3.7 and OCP 4.5 from the Quay OCP 4.5 $ oc get csv -n openshift-migration NAME DISPLAY VERSION REPLACES PHASE cam-operator.v1.2.3 Cluster Application Migration Operator 1.2.3 Succeeded 4. Execute a migration and it works well $ oc get migmigration -n openshift-migration 5791b9b0-d20c-11ea-9ffe-f1003b8734ce -o yaml ...... name: 5791b9b0-d20c-11ea-9ffe-f1003b8734ce namespace: openshift-migration ownerReferences: - apiVersion: migration.openshift.io/v1alpha1 kind: MigPlan name: test1 uid: 21b710e8-1151-4a1a-8913-8e45a015a687 resourceVersion: "68219" selfLink: /apis/migration.openshift.io/v1alpha1/namespaces/openshift-migration/migmigrations/5791b9b0-d20c-11ea-9ffe-f1003b8734ce uid: 9254d163-9a2b-47a3-997e-be0317a7c8c4 spec: migPlanRef: name: test1 namespace: openshift-migration stage: false status: conditions: - category: Advisory durable: true lastTransitionTime: "2020-07-30T02:32:41Z" message: The migration has completed successfully. reason: Completed status: "True" type: Succeeded itenerary: Final observedDigest: cef11161d78ec93695f08cffaa75b06081667e259cd823ce8c76bce9269082ed phase: Completed startTimestamp: "2020-07-30T02:28:24Z" 5. Mirroring the latest CAM to the Quay 6.CAM is automatically updated to the latest(1.2.4) on OCP 4.5 $ oc get csv -n openshift-migration NAME DISPLAY VERSION REPLACES PHASE cam-operator.v1.2.4 Cluster Application Migration Operator 1.2.4 cam-operator.v1.2.3 Succeeded $ oc get pod -n openshift-migration NAME READY STATUS RESTARTS AGE migration-controller-5fc8cf748d-ghwb8 2/2 Running 0 67m migration-operator-8657c8878d-9qgdz 2/2 Running 0 69m migration-ui-7b47c9c9d6-4nkv7 1/1 Running 0 171m registry-21b710e8-1151-4a1a-8913-8e45a015a687-t5tn7-1-cdjp7 1/1 Running 0 97m registry-21b710e8-1151-4a1a-8913-8e45a015a687-t5tn7-1-deploy 0/1 Completed 0 97m restic-cpzxq 1/1 Running 0 67m restic-njxt2 1/1 Running 0 66m restic-vcqsw 1/1 Running 0 67m velero-676884c78c-xfrg4 1/1 Running 0 67m 7. Download the latest operator.yml and update image path defined in the operator.yml $ podman cp $(podman create quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/admin/openshift-migration-rhel7-operator:v1.2):/operator.yml ./ $ sed -i 's/rhcam-1-2/admin/g' operator.yml $ sed -i 's/registry.redhat.io/quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/g' operator.yml 8. apply the operator.yml to OCP 3.7 $ oc replace -f operator.yml Actual results: 1. Restic and Velero pods failed at "Init:ImagePullBackOff" $ oc get pod -n openshift-migration --watch NAME READY STATUS RESTARTS AGE migration-operator-1643177695-9rn89 2/2 Running 0 1h migration-operator-345024850-xdxpv 0/2 ContainerCreating 0 <invalid> registry-21b710e8-1151-4a1a-8913-8e45a015a687-cpg4f-1-9p2t6 1/1 Running 0 34m restic-7mkvp 1/1 Running 0 32m restic-9ck2b 1/1 Running 0 32m restic-kgqj9 1/1 Running 0 32m restic-wh7dx 1/1 Running 0 32m velero-2797337289-8tjwd 1/1 Running 0 1h migration-operator-345024850-xdxpv 2/2 Running 0 <invalid> migration-operator-1643177695-9rn89 2/2 Terminating 0 1h migration-operator-1643177695-9rn89 0/2 Terminating 0 1h migration-operator-1643177695-9rn89 0/2 Terminating 0 1h migration-operator-1643177695-9rn89 0/2 Terminating 0 1h migration-operator-1643177695-9rn89 0/2 Terminating 0 1h velero-3550465265-4jcsd 0/1 Pending 0 <invalid> velero-3550465265-4jcsd 0/1 Pending 0 <invalid> restic-wh7dx 1/1 Terminating 0 34m velero-3550465265-4jcsd 0/1 Init:0/5 0 <invalid> ^@restic-wh7dx 0/1 Terminating 0 34m velero-3550465265-4jcsd 0/1 Init:ErrImagePull 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ImagePullBackOff 0 <invalid> restic-wh7dx 0/1 Terminating 0 34m restic-wh7dx 0/1 Terminating 0 34m restic-vzcgn 0/1 Pending 0 <invalid> restic-vzcgn 0/1 Init:0/1 0 <invalid> restic-vzcgn 0/1 Init:ErrImagePull 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ErrImagePull 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ImagePullBackOff 0 <invalid> restic-vzcgn 0/1 Init:ImagePullBackOff 0 <invalid> restic-vzcgn 0/1 Init:ImagePullBackOff 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ErrImagePull 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ImagePullBackOff 0 <invalid> ^@restic-vzcgn 0/1 Init:ErrImagePull 0 <invalid> restic-vzcgn 0/1 Init:ImagePullBackOff 0 <invalid> restic-vzcgn 0/1 Init:ErrImagePull 0 <invalid> restic-vzcgn 0/1 Init:ImagePullBackOff 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ErrImagePull 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ImagePullBackOff 0 <invalid> ^@restic-vzcgn 0/1 Init:ErrImagePull 0 <invalid> restic-vzcgn 0/1 Init:ImagePullBackOff 0 <invalid> ^@velero-3550465265-4jcsd 0/1 Init:ErrImagePull 0 <invalid> velero-3550465265-4jcsd 0/1 Init:ImagePullBackOff 0 <invalid> restic-vzcgn 0/1 Init:ErrImagePull 0 <invalid> ^@restic-vzcgn 0/1 Init:ImagePullBackOff 0 <invalid> $ oc describe pod velero-3550465265-4jcsd -n openshift-migration ...... Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 45m 45m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "certs" 45m 45m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "scratch" 45m 45m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "host-pods" 45m 45m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "velero-token-nn78h" 45m 45m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "cloud-credentials" 45m 45m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "gcp-cloud-credentials" 45m 45m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "azure-cloud-credentials" 45m 45m 2 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{setup-certificate-secret} Normal Pulling pulling image "quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/admin/openshift-migration-velero-rhel8@sha256:1a33e327dd610f0eebaaeae5b3c9b4170ab5db572b01a170be35b9ce946c0281" 45m 45m 2 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{setup-certificate-secret} Warning Failed Failed to pull image "quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/admin/openshift-migration-velero-rhel8@sha256:1a33e327dd610f0eebaaeae5b3c9b4170ab5db572b01a170be35b9ce946c0281": rpc error: code = 2 desc = manifest unknown: manifest unknown 45m 45m 2 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{setup-certificate-secret} Warning Failed Error: ErrImagePull 45m 35m 51 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{setup-certificate-secret} Normal BackOff Back-off pulling image "quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/admin/openshift-migration-velero-rhel8@sha256:1a33e327dd610f0eebaaeae5b3c9b4170ab5db572b01a170be35b9ce946c0281" 45m <invalid> 264 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{setup-certificate-secret} Warning Failed Error: ImagePullBackOff $oc describe pod velero-3550465265-4jcsd -n openshift-migration Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 46m 46m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "cloud-credentials" 46m 46m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "scratch" 46m 46m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "certs" 46m 46m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "plugins" 46m 46m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "velero-token-nn78h" 46m 46m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "gcp-cloud-credentials" 46m 46m 1 kubelet, ip-172-18-0-75.ec2.internal Normal SuccessfulMountVolume MountVolume.SetUp succeeded for volume "azure-cloud-credentials" 46m 46m 1 default-scheduler Normal Scheduled Successfully assigned velero-3550465265-4jcsd to ip-172-18-0-75.ec2.internal 46m 46m 2 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{velero-plugin} Normal Pulling pulling image "quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/admin/openshift-migration-plugin-rhel8@sha256:d9e2c4a9db9a88c68d3f6b18927c7f00d50a172a9a721ea6add0855e4db1fda0" 46m 46m 2 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{velero-plugin} Warning Failed Failed to pull image "quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/admin/openshift-migration-plugin-rhel8@sha256:d9e2c4a9db9a88c68d3f6b18927c7f00d50a172a9a721ea6add0855e4db1fda0": rpc error: code = 2 desc = manifest unknown: manifest unknown 46m 46m 2 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{velero-plugin} Warning Failed Error: ErrImagePull 46m 45m 6 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{velero-plugin} Normal BackOff Back-off pulling image "quay-enterprise-quay-enterprise.apps.cam-tgt-7410.qe.devcluster.openshift.com/admin/openshift-migration-plugin-rhel8@sha256:d9e2c4a9db9a88c68d3f6b18927c7f00d50a172a9a721ea6add0855e4db1fda0" 46m <invalid> 265 kubelet, ip-172-18-0-75.ec2.internal spec.initContainers{velero-plugin} Warning Failed Error: ImagePullBackOff CAM 1.2.4 images: TASK [operator-mirror : Get images in /tmp/ansible.9UTiOY_cam-operator/manifests/cam-operator/v1.2.4/konveyor-operator.v1.2.4.clusterserviceversion.yaml] ************************************************************************************ ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-controller-rhel8@sha256:4c58451f338eeb20e9bade9e5c61fd3ca64b469de96af77487e334dd8c9fc0e6) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-hook-runner-rhel7@sha256:86a048f0ee9726b4331d10190dc5851330b66c0326d94652ac07f33a501ae323) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-plugin-rhel8@sha256:40fee0819d750149b282b58019f4a118e296a754414fceaa4a1162deebee4898) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-registry-rhel8@sha256:37536b4487d3668a7105737695a0651e6be64720bc72a69da74153a8443ac9e1) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-rhel7-operator@sha256:a8d31fdb96e9d5e3fe42e928d0862141b7e39780e52121a995aeeb34270dd894) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-ui-rhel8@sha256:6abfaea8ac04e3b5bbf9648a3479b420b4baec35201033471020c9cae1fe1e11) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-plugin-for-aws-rhel8@sha256:bfda4f3c7f95993b5f9dace49856b124505e72bd87d42a50918f4194b7e6d7f0) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-plugin-for-gcp-rhel8@sha256:fa6c5c8dc38b8965dd9eedb9c2a86dc9a8441cb280392961a1b8b42379648014) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256:c8b0fb034244ef9598703ec9534ecfb5c97cff42157d2571eab382bdb1aeb5a2) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-restic-restore-helper-rhel8@sha256:356e8d9dede186325e3e4f8700cbde7121b6c4dc35c0099b8337c6cfb83049d8) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-rhel8@sha256:461ea0c165ed525d4276056f6aab879dcf011facb00e94acc88ae6e9f33f1637) CAM 1.2.3 images: TASK [operator-mirror : Get images in /tmp/ansible.9UTiOY_cam-operator/manifests/cam-operator/v1.2.3/konveyor-operator.v1.2.3.clusterserviceversion.yaml] ************************************************************************************ ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-controller-rhel8@sha256:f3de5a7b0e6eeee722da155622a9f20425696bd25f833519b7aec320a7b64659) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-hook-runner-rhel7@sha256:86a048f0ee9726b4331d10190dc5851330b66c0326d94652ac07f33a501ae323) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-plugin-rhel8@sha256:d9e2c4a9db9a88c68d3f6b18927c7f00d50a172a9a721ea6add0855e4db1fda0) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-registry-rhel8@sha256:ea6301a15277d448c8756881c7e2e712893ca8041c913476640f52da9e76cad9) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-rhel7-operator@sha256:cb509b4cf5566088a81cfbc17918aeae00fefd2bfcc4bef33cded372836e3d59) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-ui-rhel8@sha256:6abfaea8ac04e3b5bbf9648a3479b420b4baec35201033471020c9cae1fe1e11) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-plugin-for-aws-rhel8@sha256:22c58f575ce2f54bf995fced82f89ba173329d9b88409cf371122f9ae8cabda1) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-plugin-for-gcp-rhel8@sha256:37c0b170d168fcebb104e465621e4ce97515d82549cd37cb42be94e3e55a4271) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-plugin-for-microsoft-azure-rhel8@sha256:dd92ad748a84754e5d78287e29576a5b95448e929824e86e80c60857d0c7aff9) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-restic-restore-helper-rhel8@sha256:e9459138ec3531eefbefa181dae3fd93fe5cf210b2a0bd3bca7ba38fbec97f60) ok: [localhost] => (item=registry.stage.redhat.io/rhcam-1-2/openshift-migration-velero-rhel8@sha256:1a33e327dd610f0eebaaeae5b3c9b4170ab5db572b01a170be35b9ce946c0281) You can see Restric and Velero are taking the old images. Expected results: The upgrade should be done successfully on OCP3.7 Additional info: --- Additional comment from Jason Montleon on 2020-07-30 12:19:16 UTC --- So far it looks like this is specific to just OCP 3.7. There is no issue with 3.11 at least. Looking at the logs it seems the operator tasks are not failing when trying to patch the velero deployment and restic daemonset. This may be an issue with python-openshift that we need to take to them for investigation after doing some of our own, and I'm not sure we're going to get much help with 3.7 support. It's likely always been an issue and hasn't been detected because the old images remain on the production registry so they wouldn't fail to pull even if the patch command fails to update the image tag. For now the workaround is: oc delete --ignore-not-found=true deployment migration-controller migration-ui velero && oc delete --ignore-not-found=true daemonset restic
*** Bug 1862544 has been marked as a duplicate of this bug. ***
Looking further it seems to be just the initContainers.
This might be an openshift bug:https://github.com/kubernetes/kubernetes/issues/47264
https://github.com/kubernetes/kubernetes/issues/47264 "Correct. Pre-1.8, the information is duplicated in two places in the object, with the annotation taking precedence. 1.8+, only the field is honored. You should set the field for forward compatibility, and set the annotation if you want your changes to be effective against a pre-1.8 server." We may be able to template in the relevant information for 3.7 the same way we set v1beta1 / v1 for the Deployment depending on the version.
https://github.com/konveyor/mig-operator/pull/457
verified with MTC 1.3.0
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Migration Toolkit for Containers (MTC) Tool image release advisory 1.3.0), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:4148