Description of problem (please be detailed as possible and provide log snippests): After migration of ODFMS consumer cluster to fusion agent, csi-rbdplugin stuck in ImagePullBackOff on consumer clusters after Migration . ----------------- Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 4m15s default-scheduler Successfully assigned fusion-storage/csi-rbdplugin-6g5p2 to ip-10-0-20-39.ap-south-1.compute.internal by ip-10-0-19-159 Normal Pulled 4m15s kubelet Container image "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:0f17fb00e45a9fd8019ea987f201cb1e10066e1fe58f2d88866752561691a0d0" already present on machine Normal Created 4m15s kubelet Created container csi-driver-registrar Normal Started 4m15s kubelet Started container csi-driver-registrar Warning Failed 3m59s (x2 over 4m14s) kubelet Error: ErrImagePull Normal Pulling 3m59s (x2 over 4m14s) kubelet Pulling image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160" Warning Failed 3m58s (x2 over 4m13s) kubelet Failed to pull image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160": rpc error: code = Unknown desc = reading manifest sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160 in registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9: unknown: Not Found Warning Failed 3m58s (x2 over 4m13s) kubelet Error: ErrImagePull Normal BackOff 3m47s (x2 over 4m13s) kubelet Back-off pulling image "registry.redhat.io/odf4/cephcsi-rhel9@sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62" Warning Failed 3m47s (x2 over 4m13s) kubelet Error: ImagePullBackOff Normal BackOff 3m47s (x2 over 4m13s) kubelet Back-off pulling image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160" Warning Failed 3m47s (x2 over 4m13s) kubelet Error: ImagePullBackOff Normal Pulling 3m34s (x3 over 4m15s) kubelet Pulling image "registry.redhat.io/odf4/cephcsi-rhel9@sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62" Warning Failed 3m33s (x3 over 4m14s) kubelet Failed to pull image "registry.redhat.io/odf4/cephcsi-rhel9@sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62": rpc error: code = Unknown desc = reading manifest sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62 in registry.redhat.io/odf4/cephcsi-rhel9: unknown: Not Found ----------------- Version of all relevant components (if applicable): $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.36 True False 6h46m Cluster version is 4.11.36 Versions on consumer before Migration: ========CSV in openshift-storage namespace ====== NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.11.6 NooBaa Operator 4.11.6 mcg-operator.v4.11.5 Succeeded observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded ocs-operator.v4.11.6 OpenShift Container Storage 4.11.6 ocs-operator.v4.11.5 Succeeded ocs-osd-deployer.v2.0.12 OCS OSD Deployer 2.0.12 ocs-osd-deployer.v2.0.11 Installing odf-csi-addons-operator.v4.11.6 CSI Addons 4.11.6 odf-csi-addons-operator.v4.11.5 Succeeded odf-operator.v4.11.6 OpenShift Data Foundation 4.11.6 odf-operator.v4.11.5 Succeeded ose-prometheus-operator.4.10.0 Prometheus Operator 4.10.0 ose-prometheus-operator.4.8.0 Succeeded route-monitor-operator.v0.1.494-a973226 Route Monitor Operator 0.1.494-a973226 route-monitor-operator.v0.1.493-a866e7c Succeeded Versions on consumer after Migration: $ oc get csv -n fusion-storage NAME DISPLAY VERSION REPLACES PHASE observability-operator.v0.0.20 Observability Operator 0.0.20 observability-operator.v0.0.19 Succeeded ocs-client-operator.v4.13.0-164.stable OpenShift Data Foundation Client 4.13.0-164.stable Succeeded odf-csi-addons-operator.v4.13.0-164.stable CSI Addons 4.13.0-164.stable Succeeded route-monitor-operator.v0.1.494-a973226 Route Monitor Operator 0.1.494-a973226 route-monitor-operator.v0.1.493-a866e7c Succeeded Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? no Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? 2/2 Can this issue reproduce from the UI? no If this is a regression, please provide more details to justify this: Steps to Reproduce: 1.Create Appliance mode provider with 2 consumer 2. create fusion agent provider cluster 3. start migration using guide https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit?usp=sharing and migrate.sh script https://github.com/rchikatw/odf-managed-service-migration/blob/main/migrate.sh 4.after migration initiually cephligin and rbd plugin stuck in ImagePullBackOff then apply workaround as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2186145 => Apply ImageContentSourcePolicy , restart csi-rbd and csi-cephfs deployment and daemonset => csi-cephfsplugin turn to running status Actual results: csi-rbdplugin are in ImagePullBackOff state Expected results: csi-rbdplugin should be in running state Additional info: Pre-workaround pods status: =======PODS ====== NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES csi-addons-controller-manager-999df6799-6cdw9 2/2 Running 0 4m55s 10.129.2.43 ip-10-0-21-182.us-east-2.compute.internal <none> <none> csi-cephfsplugin-4vkqb 1/2 ImagePullBackOff 0 4m38s 10.0.21.182 ip-10-0-21-182.us-east-2.compute.internal <none> <none> csi-cephfsplugin-dj7ll 1/2 ImagePullBackOff 0 4m38s 10.0.18.141 ip-10-0-18-141.us-east-2.compute.internal <none> <none> csi-cephfsplugin-pl9qg 1/2 ImagePullBackOff 0 4m38s 10.0.15.48 ip-10-0-15-48.us-east-2.compute.internal <none> <none> csi-cephfsplugin-provisioner-68487c5749-4lhgz 4/5 ImagePullBackOff 0 4m38s 10.129.2.45 ip-10-0-21-182.us-east-2.compute.internal <none> <none> csi-cephfsplugin-provisioner-68487c5749-9qcnx 4/5 ImagePullBackOff 0 4m38s 10.131.0.181 ip-10-0-18-141.us-east-2.compute.internal <none> <none> csi-rbdplugin-4wdc5 1/3 ImagePullBackOff 0 4m38s 10.0.18.141 ip-10-0-18-141.us-east-2.compute.internal <none> <none> csi-rbdplugin-m5882 1/3 ImagePullBackOff 0 4m38s 10.0.21.182 ip-10-0-21-182.us-east-2.compute.internal <none> <none> csi-rbdplugin-provisioner-5d95c77584-nh7lv 4/5 ImagePullBackOff 0 4m38s 10.129.2.46 ip-10-0-21-182.us-east-2.compute.internal <none> <none> csi-rbdplugin-provisioner-5d95c77584-vp84r 4/5 ImagePullBackOff 0 4m38s 10.131.0.182 ip-10-0-18-141.us-east-2.compute.internal <none> <none> csi-rbdplugin-q2dgj 1/3 ImagePullBackOff 0 4m38s 10.0.15.48 ip-10-0-15-48.us-east-2.compute.internal <none> <none> ocs-client-operator-controller-manager-7c65db77b-mcd7l 2/2 Running 0 4m10s 10.129.2.47 ip-10-0-21-182.us-east-2.compute.internal <none> <none> storageclient-f4d201833df18bd9-status-reporter-28031459-qk97p 0/1 Completed 0 2m37s 10.129.2.49 ip-10-0-21-182.us-east-2.compute.internal <none> <none> storageclient-f4d201833df18bd9-status-reporter-28031460-c2dlh 0/1 Completed 0 97s 10.129.2.50 ip-10-0-21-182.us-east-2.compute.internal <none> <none> storageclient-f4d201833df18bd9-status-reporter-28031461-tp7gl 0/1 Completed 0 37s 10.129.2.52 ip-10-0-21-182.us-east-2.compute.internal <none> <none> ------------- ====Step 4 steps======================== --------ImageContentSourcePolicy yaml---------------- apiVersion: operator.openshift.io/v1alpha1 kind: ImageContentSourcePolicy metadata: name: client-operator-icsp spec: repositoryDigestMirrors: - mirrors: - quay.io/rhceph-dev/openshift-ose-csi-external-provisioner source: registry.redhat.io/openshift4/ose-csi-external-provisioner - mirrors: - quay.io/rhceph-dev/openshift-ose-csi-external-attacher source: registry.redhat.io/openshift4/ose-csi-external-attacher - mirrors: - quay.io/rhceph-dev/openshift-ose-csi-external-attacher source: registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8 - mirrors: - quay.io/rhceph-dev/openshift-ose-csi-external-resizer source: registry.redhat.io/openshift4/ose-csi-external-resizer - mirrors: - quay.io/rhceph-dev/openshift-ose-csi-external-snapshotter source: registry.redhat.io/openshift4/ose-csi-external-snapshotter - mirrors: - quay.io/rhceph-dev/openshift-ose-csi-external-snapshotter source: registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8 - mirrors: - quay.io/rhceph-dev/openshift-ose-csi-node-driver-registrar source: registry.redhat.io/openshift4/ose-csi-node-driver-registrar - mirrors: - quay.io/rhceph-dev/odf4-cephcsi-rhel8 source: registry.redhat.io/odf4/cephcsi-rhel8 - mirrors: - quay.io/rhceph-dev/odf4-cephcsi-rhel9 source: registry.redhat.io/odf4/cephcsi-rhel9 - mirrors: - quay.io/rhceph-dev/odf4-csi-addons-sidecar-rhel8 source: registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8 - mirrors: - quay.io/rhceph-dev/odf4-csi-addons-sidecar-rhel9 source: registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9 ----- $ oc get deployments NAME READY UP-TO-DATE AVAILABLE AGE csi-addons-controller-manager 1/1 1 1 14m csi-cephfsplugin-provisioner 0/2 2 0 14m csi-rbdplugin-provisioner 0/2 2 0 14m ocs-client-operator-controller-manager 1/1 1 1 14m sgatfane-mac:odf-managed-service-migration sgatfane$ oc rollout restart deployments csi-cephfsplugin-provisioner csi-rbdplugin-provisioner deployment.apps/csi-cephfsplugin-provisioner restarted deployment.apps/csi-rbdplugin-provisioner restarted oc get daemonsets NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE csi-cephfsplugin 3 3 0 3 0 <none> 14m csi-rbdplugin 3 3 0 3 0 <none> 14m oc rollout restart daemonsets csi-cephfsplugin csi-cephfsplugin daemonset.apps/csi-cephfsplugin restarted daemonset.apps/csi-cephfsplugin restarted $ oc get daemonsets NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE csi-cephfsplugin 3 3 3 3 3 <none> 141m csi-rbdplugin 3 3 0 3 0 <none> 141m Pods o/p after step4/after workaround: $ oc get pods -n fusion-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-67bbcb795f-hl2nd 2/2 Running 0 24m csi-cephfsplugin-lctch 2/2 Running 0 8m23s csi-cephfsplugin-provisioner-79dcf69484-4gmd9 5/5 Running 0 8m48s csi-cephfsplugin-provisioner-79dcf69484-8vhhh 5/5 Running 0 9m14s csi-cephfsplugin-wgcxd 2/2 Running 0 8m23s csi-cephfsplugin-zxzr8 2/2 Running 0 8m17s csi-rbdplugin-6n858 2/3 ImagePullBackOff 0 23m csi-rbdplugin-lgndb 2/3 ImagePullBackOff 0 23m csi-rbdplugin-provisioner-68d6dc9c47-fbd72 5/5 Running 0 8m48s csi-rbdplugin-provisioner-68d6dc9c47-lxzpk 5/5 Running 0 9m13s csi-rbdplugin-vc2bm 2/3 ImagePullBackOff 0 23m ocs-client-operator-controller-manager-7c65db77b-q7g4s 2/2 Running 0 23m storageclient-f4d201833df18bd9-status-reporter-28031506-rlfzl 0/1 Completed 0 2m39s storageclient-f4d201833df18bd9-status-reporter-28031507-zqc7q 0/1 Completed 0 99s storageclient-f4d201833df18bd9-status-reporter-28031508-tmdfj 0/1 Completed 0 39s
Verified the change in Repo pointed out in comment#5. However as per comment#4, for the unreleased version I used the workaround and it works. This is performed with FaaS Provider deployed with quay.io/resoni/managed-fusion-agent-index:4.13.0-168 and quay.io/dbindra/managed-fusion-agent:apr_27_catsrc(OCS v4.12.3-12, OCS client v4.12.3-12) ROSA4.12
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2023:3742