Bug 2187969 - [ODFMS-Migration ] [OCS Client Operator] csi-rbdplugin stuck in ImagePullBackOff on consumer clusters after Migration
Summary: [ODFMS-Migration ] [OCS Client Operator] csi-rbdplugin stuck in ImagePullBack...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: build
Version: 4.13
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ODF 4.13.0
Assignee: Boris Ranto
QA Contact: suchita
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-19 09:48 UTC by suchita
Modified: 2023-08-09 16:37 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-06-21 15:25:08 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2023:3742 0 None None None 2023-06-21 15:25:52 UTC

Internal Links: 2187976

Description suchita 2023-04-19 09:48:00 UTC
Description of problem (please be detailed as possible and provide log
snippests):

After migration of ODFMS consumer cluster to fusion agent,  csi-rbdplugin stuck in ImagePullBackOff on consumer clusters after Migration .
-----------------
Events:
  Type     Reason     Age                    From               Message
  ----     ------     ----                   ----               -------
  Normal   Scheduled  4m15s                  default-scheduler  Successfully assigned fusion-storage/csi-rbdplugin-6g5p2 to ip-10-0-20-39.ap-south-1.compute.internal by ip-10-0-19-159
  Normal   Pulled     4m15s                  kubelet            Container image "registry.redhat.io/openshift4/ose-csi-node-driver-registrar@sha256:0f17fb00e45a9fd8019ea987f201cb1e10066e1fe58f2d88866752561691a0d0" already present on machine
  Normal   Created    4m15s                  kubelet            Created container csi-driver-registrar
  Normal   Started    4m15s                  kubelet            Started container csi-driver-registrar
  Warning  Failed     3m59s (x2 over 4m14s)  kubelet            Error: ErrImagePull
  Normal   Pulling    3m59s (x2 over 4m14s)  kubelet            Pulling image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160"
  Warning  Failed     3m58s (x2 over 4m13s)  kubelet            Failed to pull image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160": rpc error: code = Unknown desc = reading manifest sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160 in registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9: unknown: Not Found
  Warning  Failed     3m58s (x2 over 4m13s)  kubelet            Error: ErrImagePull
  Normal   BackOff    3m47s (x2 over 4m13s)  kubelet            Back-off pulling image "registry.redhat.io/odf4/cephcsi-rhel9@sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62"
  Warning  Failed     3m47s (x2 over 4m13s)  kubelet            Error: ImagePullBackOff
  Normal   BackOff    3m47s (x2 over 4m13s)  kubelet            Back-off pulling image "registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9@sha256:2689ee3c9a945d3325d605a15ba4e39dad8cacbbb3cbb2afe518bfa73f637160"
  Warning  Failed     3m47s (x2 over 4m13s)  kubelet            Error: ImagePullBackOff
  Normal   Pulling    3m34s (x3 over 4m15s)  kubelet            Pulling image "registry.redhat.io/odf4/cephcsi-rhel9@sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62"
  Warning  Failed     3m33s (x3 over 4m14s)  kubelet            Failed to pull image "registry.redhat.io/odf4/cephcsi-rhel9@sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62": rpc error: code = Unknown desc = reading manifest sha256:afec2d2995c124a93cd30d1e42f789699d50c82a3f8700d1e1b531cb600dbd62 in registry.redhat.io/odf4/cephcsi-rhel9: unknown: Not Found
-----------------



Version of all relevant components (if applicable):

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.36   True        False         6h46m   Cluster version is 4.11.36
Versions on consumer before Migration:
 
========CSV in openshift-storage namespace ======
NAME                                      DISPLAY                       VERSION           REPLACES                                  PHASE
mcg-operator.v4.11.6                      NooBaa Operator               4.11.6            mcg-operator.v4.11.5                      Succeeded
observability-operator.v0.0.20            Observability Operator        0.0.20            observability-operator.v0.0.19            Succeeded
ocs-operator.v4.11.6                      OpenShift Container Storage   4.11.6            ocs-operator.v4.11.5                      Succeeded
ocs-osd-deployer.v2.0.12                  OCS OSD Deployer              2.0.12            ocs-osd-deployer.v2.0.11                  Installing
odf-csi-addons-operator.v4.11.6           CSI Addons                    4.11.6            odf-csi-addons-operator.v4.11.5           Succeeded
odf-operator.v4.11.6                      OpenShift Data Foundation     4.11.6            odf-operator.v4.11.5                      Succeeded
ose-prometheus-operator.4.10.0            Prometheus Operator           4.10.0            ose-prometheus-operator.4.8.0             Succeeded
route-monitor-operator.v0.1.494-a973226   Route Monitor Operator        0.1.494-a973226   route-monitor-operator.v0.1.493-a866e7c   Succeeded

Versions on consumer after Migration:
$ oc get csv -n fusion-storage
NAME                                         DISPLAY                            VERSION             REPLACES                                  PHASE
observability-operator.v0.0.20               Observability Operator             0.0.20              observability-operator.v0.0.19            Succeeded
ocs-client-operator.v4.13.0-164.stable       OpenShift Data Foundation Client   4.13.0-164.stable                                             Succeeded
odf-csi-addons-operator.v4.13.0-164.stable   CSI Addons                         4.13.0-164.stable                                             Succeeded
route-monitor-operator.v0.1.494-a973226      Route Monitor Operator             0.1.494-a973226     route-monitor-operator.v0.1.493-a866e7c   Succeeded






Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
no

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
2/2

Can this issue reproduce from the UI?
no

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.Create Appliance mode provider with 2 consumer
2. create fusion agent provider cluster
3. start migration using guide https://docs.google.com/document/d/1Jdx8czlMjbumvilw8nZ6LtvWOMAx3H4TfwoVwiBs0nE/edit?usp=sharing and migrate.sh script https://github.com/rchikatw/odf-managed-service-migration/blob/main/migrate.sh
4.after migration initiually cephligin and rbd plugin stuck in ImagePullBackOff
then apply workaround as mentioned in https://bugzilla.redhat.com/show_bug.cgi?id=2186145
=> Apply ImageContentSourcePolicy , restart csi-rbd and csi-cephfs deployment and daemonset 
=> csi-cephfsplugin turn to running status


Actual results:
csi-rbdplugin are in ImagePullBackOff state

Expected results:
csi-rbdplugin should be in running state

Additional info:
Pre-workaround pods status:
=======PODS ======
NAME                                                            READY   STATUS             RESTARTS   AGE     IP             NODE                                        NOMINATED NODE   READINESS GATES
csi-addons-controller-manager-999df6799-6cdw9                   2/2     Running            0          4m55s   10.129.2.43    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-4vkqb                                          1/2     ImagePullBackOff   0          4m38s   10.0.21.182    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-dj7ll                                          1/2     ImagePullBackOff   0          4m38s   10.0.18.141    ip-10-0-18-141.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-pl9qg                                          1/2     ImagePullBackOff   0          4m38s   10.0.15.48     ip-10-0-15-48.us-east-2.compute.internal    <none>           <none>
csi-cephfsplugin-provisioner-68487c5749-4lhgz                   4/5     ImagePullBackOff   0          4m38s   10.129.2.45    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
csi-cephfsplugin-provisioner-68487c5749-9qcnx                   4/5     ImagePullBackOff   0          4m38s   10.131.0.181   ip-10-0-18-141.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-4wdc5                                             1/3     ImagePullBackOff   0          4m38s   10.0.18.141    ip-10-0-18-141.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-m5882                                             1/3     ImagePullBackOff   0          4m38s   10.0.21.182    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-5d95c77584-nh7lv                      4/5     ImagePullBackOff   0          4m38s   10.129.2.46    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-provisioner-5d95c77584-vp84r                      4/5     ImagePullBackOff   0          4m38s   10.131.0.182   ip-10-0-18-141.us-east-2.compute.internal   <none>           <none>
csi-rbdplugin-q2dgj                                             1/3     ImagePullBackOff   0          4m38s   10.0.15.48     ip-10-0-15-48.us-east-2.compute.internal    <none>           <none>
ocs-client-operator-controller-manager-7c65db77b-mcd7l          2/2     Running            0          4m10s   10.129.2.47    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
storageclient-f4d201833df18bd9-status-reporter-28031459-qk97p   0/1     Completed          0          2m37s   10.129.2.49    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
storageclient-f4d201833df18bd9-status-reporter-28031460-c2dlh   0/1     Completed          0          97s     10.129.2.50    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
storageclient-f4d201833df18bd9-status-reporter-28031461-tp7gl   0/1     Completed          0          37s     10.129.2.52    ip-10-0-21-182.us-east-2.compute.internal   <none>           <none>
-------------
====Step 4 steps========================

--------ImageContentSourcePolicy yaml----------------
apiVersion: operator.openshift.io/v1alpha1
kind: ImageContentSourcePolicy
metadata:
  name: client-operator-icsp
spec:
  repositoryDigestMirrors:
  - mirrors:
    - quay.io/rhceph-dev/openshift-ose-csi-external-provisioner
    source: registry.redhat.io/openshift4/ose-csi-external-provisioner
  - mirrors:
    - quay.io/rhceph-dev/openshift-ose-csi-external-attacher
    source: registry.redhat.io/openshift4/ose-csi-external-attacher
  - mirrors:
    - quay.io/rhceph-dev/openshift-ose-csi-external-attacher
    source: registry.redhat.io/openshift4/ose-csi-external-attacher-rhel8
  - mirrors:
    - quay.io/rhceph-dev/openshift-ose-csi-external-resizer
    source: registry.redhat.io/openshift4/ose-csi-external-resizer
  - mirrors:
    - quay.io/rhceph-dev/openshift-ose-csi-external-snapshotter
    source: registry.redhat.io/openshift4/ose-csi-external-snapshotter
  - mirrors:
    - quay.io/rhceph-dev/openshift-ose-csi-external-snapshotter
    source: registry.redhat.io/openshift4/ose-csi-external-snapshotter-rhel8
  - mirrors:
    - quay.io/rhceph-dev/openshift-ose-csi-node-driver-registrar
    source: registry.redhat.io/openshift4/ose-csi-node-driver-registrar
  - mirrors:
    - quay.io/rhceph-dev/odf4-cephcsi-rhel8
    source: registry.redhat.io/odf4/cephcsi-rhel8
  - mirrors:
    - quay.io/rhceph-dev/odf4-cephcsi-rhel9
    source: registry.redhat.io/odf4/cephcsi-rhel9
  - mirrors:
    - quay.io/rhceph-dev/odf4-csi-addons-sidecar-rhel8
    source: registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel8
  - mirrors:
    - quay.io/rhceph-dev/odf4-csi-addons-sidecar-rhel9
    source: registry.redhat.io/odf4/odf-csi-addons-sidecar-rhel9
-----

$ oc get deployments
NAME                                     READY   UP-TO-DATE   AVAILABLE   AGE
csi-addons-controller-manager            1/1     1            1           14m
csi-cephfsplugin-provisioner             0/2     2            0           14m
csi-rbdplugin-provisioner                0/2     2            0           14m
ocs-client-operator-controller-manager   1/1     1            1           14m
sgatfane-mac:odf-managed-service-migration sgatfane$ oc rollout restart deployments csi-cephfsplugin-provisioner csi-rbdplugin-provisioner
deployment.apps/csi-cephfsplugin-provisioner restarted
deployment.apps/csi-rbdplugin-provisioner restarted
oc get daemonsets
NAME               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
csi-cephfsplugin   3         3         0       3            0           <none>          14m
csi-rbdplugin      3         3         0       3            0           <none>          14m
oc rollout restart daemonsets csi-cephfsplugin csi-cephfsplugin
daemonset.apps/csi-cephfsplugin restarted
daemonset.apps/csi-cephfsplugin restarted

$ oc get daemonsets
NAME               DESIRED   CURRENT   READY   UP-TO-DATE   AVAILABLE   NODE SELECTOR   AGE
csi-cephfsplugin   3         3         3       3            3           <none>          141m
csi-rbdplugin      3         3         0       3            0           <none>          141m

Pods o/p after step4/after workaround:
$ oc get pods -n fusion-storage
NAME                                                            READY   STATUS             RESTARTS   AGE
csi-addons-controller-manager-67bbcb795f-hl2nd                  2/2     Running            0          24m
csi-cephfsplugin-lctch                                          2/2     Running            0          8m23s
csi-cephfsplugin-provisioner-79dcf69484-4gmd9                   5/5     Running            0          8m48s
csi-cephfsplugin-provisioner-79dcf69484-8vhhh                   5/5     Running            0          9m14s
csi-cephfsplugin-wgcxd                                          2/2     Running            0          8m23s
csi-cephfsplugin-zxzr8                                          2/2     Running            0          8m17s
csi-rbdplugin-6n858                                             2/3     ImagePullBackOff   0          23m
csi-rbdplugin-lgndb                                             2/3     ImagePullBackOff   0          23m
csi-rbdplugin-provisioner-68d6dc9c47-fbd72                      5/5     Running            0          8m48s
csi-rbdplugin-provisioner-68d6dc9c47-lxzpk                      5/5     Running            0          9m13s
csi-rbdplugin-vc2bm                                             2/3     ImagePullBackOff   0          23m
ocs-client-operator-controller-manager-7c65db77b-q7g4s          2/2     Running            0          23m
storageclient-f4d201833df18bd9-status-reporter-28031506-rlfzl   0/1     Completed          0          2m39s
storageclient-f4d201833df18bd9-status-reporter-28031507-zqc7q   0/1     Completed          0          99s
storageclient-f4d201833df18bd9-status-reporter-28031508-tmdfj   0/1     Completed          0          39s

Comment 8 suchita 2023-05-05 07:22:15 UTC
Verified the change in Repo pointed out in comment#5.
However as per comment#4, for the unreleased version I used the workaround and it works. 
This is performed with FaaS Provider deployed with 
quay.io/resoni/managed-fusion-agent-index:4.13.0-168 and quay.io/dbindra/managed-fusion-agent:apr_27_catsrc(OCS v4.12.3-12, OCS client v4.12.3-12)
ROSA4.12

Comment 12 errata-xmlrpc 2023-06-21 15:25:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Data Foundation 4.13.0 enhancement and bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2023:3742


Note You need to log in before you can comment on or make changes to this bug.