Description of problem (please be detailed as possible and provide log snippests): I see those in logs: 2024-04-17T20:07:37.767093153Z 2024-04-17 20:07:37.767052 E | ceph-csi: failed to reconcile failed to update CSI driver options for cluster "ocs-storagecluster-cephcluster": failed to fetch current csi config map: configmaps "rook-ceph-csi-config" not found 2024-04-17T20:07:37.909745677Z 2024-04-17 20:07:37.909698 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.910571987Z 2024-04-17 20:07:37.910550 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.913702659Z 2024-04-17 20:07:37.913663 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.914446422Z 2024-04-17 20:07:37.914426 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.915218322Z 2024-04-17 20:07:37.915198 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.915844895Z 2024-04-17 20:07:37.915827 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.916487018Z 2024-04-17 20:07:37.916468 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.917080921Z 2024-04-17 20:07:37.917062 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:37.917728634Z 2024-04-17 20:07:37.917709 E | ceph-nodedaemon-controller: ceph version not found for image "registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:cda4d8682b12f13ce90211cad773100c32584b6bcea33a6cb69a66d9aece86f5" used by cluster "ocs-storagecluster-cephcluster" in namespace "openshift-storage". attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:38.019554664Z 2024-04-17 20:07:38.019513 I | clusterdisruption-controller: deleted all legacy node drain canary pods 2024-04-17T20:07:38.565867552Z 2024-04-17 20:07:38.565819 I | ceph-spec: parsing mon endpoints: a=172.30.35.166:3300,b=172.30.26.128:3300,c=172.30.125.176:3300 2024-04-17T20:07:38.943432810Z 2024-04-17 20:07:38.943366 E | ceph-block-pool-controller: failed to reconcile CephBlockPool "openshift-storage/ocs-storagecluster-cephblockpool". failed to fetch ceph version from cephcluster "ocs-storagecluster-cephcluster": attempt to determine ceph version for the current cluster image timed out 2024-04-17T20:07:38.965356091Z 2024-04-17 20:07:38.965320 I | ceph-spec: parsing mon endpoints: a=172.30.35.166:3300,b=172.30.26.128:3300,c=172.30.125.176:3300 2024-04-17T20:07:38.965435379Z 2024-04-17 20:07:38.965419 I | ceph-fs-subvolumegroup-controller: creating ceph filesystem subvolume group ocs-storagecluster-cephfilesystem-csi in namespace openshift-storage 2024-04-17T20:07:38.965435379Z 2024-04-17 20:07:38.965429 I | cephclient: creating cephfs "ocs-storagecluster-cephfilesystem" subvolume group "csi" 2024-04-17T20:07:39.150090567Z 2024-04-17 20:07:39.150044 I | op-k8sutil: batch job ceph-file-controller-detect-version deleted 2024-04-17T20:07:39.164704664Z 2024-04-17 20:07:39.164675 I | ceph-spec: parsing mon endpoints: a=172.30.35.166:3300,b=172.30.26.128:3300,c=172.30.125.176:3300 2024-04-17T20:07:39.751737909Z 2024-04-17 20:07:39.751694 I | cephclient: successfully created subvolume group "csi" in filesystem "ocs-storagecluster-cephfilesystem" 2024-04-17T20:07:39.765506554Z 2024-04-17 20:07:39.765477 E | ceph-csi: failed to reconcile failed to update CSI driver options for cluster "ocs-storagecluster-cephcluster": failed to fetch current csi config map: configmaps "rook-ceph-csi-config" not found Version of all relevant components (if applicable): ODF: 4.16.0-78 OCP: 4.16.0-0.nightly-2024-04-16-195622 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Not sure Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Instal ODF and OCP 4.15 2. Upgrade to 4.16 3. Actual results: Mentioned issue Expected results: Have everything up and running Additional info:
Bug reproduced manually: 1.Deploy OCP 4.16 2. Install ODF4.15.1 opertor [GA'ed] 3.Cehck ceph status [HEALTH_OK] 4.Upgrade ODF: a.Disabling default source: redhat-operators $ oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge operatorhub.config.openshift.io/cluster patched b.Change channel in subscription odf-operator [stable-4.15 -> stable-4.16] $ oc edit subscription odf-operator -n openshift-storage c.Create catalog source: $ oc create -f CatalogSource.yaml oc edit CatalogSource -n openshift-marketplace redhat-operators catalogsource.operators.coreos.com/redhat-operators created oviner~/multus$ cat ~/CatalogSource.yaml --- apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: redhat-operators namespace: openshift-marketplace labels: ocs-operator-internal: "true" spec: displayName: Openshift Container Storage icon: base64data: "" mediatype: "" image: quay.io/rhceph-dev/ocs-registry:latest-stable-4.16 publisher: Red Hat sourceType: grpc priority: 100 # If the registry image still have the same tag (latest-stable-4.6, or for stage testing) # we need to have this updateStrategy, otherwise we will not see new pushed content. updateStrategy: registryPoll: interval: 15m d.Enable icsp podman run --entrypoint cat quay.io/rhceph-dev/ocs-registry:latest-stable-4.16 /icsp.yaml | oc apply -f - e.Check rook ceph opertor: [stack in Installing state] $ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-storage rook-ceph-operator.v4.16.0-77.stable Rook-Ceph 4.16.0-77.stable Installing f.rook ceph opertor in CLBO state $ oc get pods rook-ceph-operator-6d548fdc94-sbpwp NAME READY STATUS RESTARTS AGE rook-ceph-operator-6d548fdc94-sbpwp 0/1 CrashLoopBackOff 23 (44s ago) 98m For more Info: https://docs.google.com/document/d/13cUS2b6TUl-_2iCeM9iMoR57W1NXWuPEXPFmHPkijpo/edit MG Link: http://rhsqe-repo.lab.eng.blr.redhat.com/OCS/ocs-qe-bugs/bz-2275886/
After working with Madhu's private image, the rook-ceph-operator pod running however some csi pods moved to CLBO because Madhu used old cephcsi in his private image $ oc edit csv rook-ceph-operator.v4.16.0-79.stable -n openshift-storage image: quay.io/madhupr001/rook:v1
Bug fixed [tested on quay.io/rhceph-dev/ocs-registry:4.16.0-82] 1.Deploy OCP4.16 [4.16.0-0.nightly-2024-04-18-141003] 2.Install GA’ed 4.15.1 $ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-operator-lifecycle-manager packageserver Package Server 0.0.1-snapshot Succeeded openshift-storage mcg-operator.v4.15.1-rhodf NooBaa Operator 4.15.1-rhodf mcg-operator.v4.15.0-rhodf Succeeded openshift-storage ocs-operator.v4.15.1-rhodf OpenShift Container Storage 4.15.1-rhodf ocs-operator.v4.15.0-rhodf Succeeded openshift-storage odf-csi-addons-operator.v4.15.1-rhodf CSI Addons 4.15.1-rhodf odf-csi-addons-operator.v4.15.0-rhodf Succeeded openshift-storage odf-operator.v4.15.1-rhodf OpenShift Data Foundation 4.15.1-rhodf odf-operator.v4.15.0-rhodf Succeeded 3.Create storagecluster 4.Check storagecluster status, pods status and ceph status: $ oc get storagecluster NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 8m18s Ready 2024-04-21T12:11:47Z 4.15.1 sh-5.1$ ceph -s cluster: id: a2271230-7eb4-4459-91aa-911aa8a41dca health: HEALTH_OK 5.Upgrade ODF4.15.1 -> ODF4.16.0 a.Disabling default source: redhat-operators $ oc patch operatorhub.config.openshift.io/cluster -p='{"spec":{"sources":[{"disabled":true,"name":"redhat-operators"}]}}' --type=merge operatorhub.config.openshift.io/cluster patched b.Change channel in subscription odf-operator [stable-4.15 -> stable-4.16] $ oc edit subscription odf-operator -n openshift-storage c.Create catalog source: oviner~$ cat CatalogSource.yaml --- apiVersion: operators.coreos.com/v1alpha1 kind: CatalogSource metadata: name: redhat-operators namespace: openshift-marketplace labels: ocs-operator-internal: "true" spec: displayName: Openshift Container Storage icon: base64data: "" mediatype: "" image: quay.io/rhceph-dev/ocs-registry:4.16.0-82 publisher: Red Hat sourceType: grpc priority: 100 # If the registry image still have the same tag (latest-stable-4.6, or for stage testing) # we need to have this updateStrategy, otherwise we will not see new pushed content. updateStrategy: registryPoll: interval: 15m oviner~$ oc create -f CatalogSource.yaml catalogsource.operators.coreos.com/redhat-operators created oviner~$ d.Enable icsp oviner~$ podman run --entrypoint cat quay.io/rhceph-dev/ocs-registry:4.16.0-82 /icsp.yaml | oc apply -f - Trying to pull quay.io/rhceph-dev/ocs-registry:4.16.0-82... Getting image source signatures Copying blob 34dd843a0b94 done | imagecontentsourcepolicy.operator.openshift.io/df-repo-v4.16.0-82 created 6.Check csv: oviner~$ oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-operator-lifecycle-manager packageserver Package Server 0.0.1-snapshot Succeeded openshift-storage mcg-operator.v4.16.0-82.stable NooBaa Operator 4.16.0-82.stable mcg-operator.v4.15.2-rhodf Succeeded openshift-storage ocs-client-operator.v4.16.0-82.stable OpenShift Data Foundation Client 4.16.0-82.stable Succeeded openshift-storage ocs-operator.v4.16.0-82.stable OpenShift Container Storage 4.16.0-82.stable ocs-operator.v4.15.2-rhodf Succeeded openshift-storage odf-csi-addons-operator.v4.16.0-82.stable CSI Addons 4.16.0-82.stable odf-csi-addons-operator.v4.15.2-rhodf Succeeded openshift-storage odf-operator.v4.16.0-82.stable OpenShift Data Foundation 4.16.0-82.stable odf-operator.v4.15.1-rhodf Succeeded openshift-storage odf-prometheus-operator.v4.16.0-82.stable Prometheus Operator 4.16.0-82.stable Succeeded openshift-storage rook-ceph-operator.v4.16.0-82.stable Rook-Ceph 4.16.0-82.stable Succeeded 7.Check pods status: $ oc get pods NAME READY STATUS RESTARTS AGE console-7c4f6fbf7b-2wnf7 1/1 Running 0 14m csi-addons-controller-manager-5645b9d78d-rxl7p 2/2 Running 0 10m csi-cephfsplugin-9szq7 2/2 Running 0 9m30s csi-cephfsplugin-lv7ks 2/2 Running 0 9m30s csi-cephfsplugin-provisioner-587f9758d5-9l9bh 6/6 Running 0 9m30s csi-cephfsplugin-provisioner-587f9758d5-cw2cz 6/6 Running 0 9m30s csi-cephfsplugin-smfbd 2/2 Running 0 9m30s csi-rbdplugin-725d5 3/3 Running 0 9m30s csi-rbdplugin-c7snt 3/3 Running 0 9m30s csi-rbdplugin-provisioner-5b6d758598-s4h7x 6/6 Running 0 9m30s csi-rbdplugin-provisioner-5b6d758598-t8d64 6/6 Running 0 9m30s csi-rbdplugin-z46fn 3/3 Running 0 9m30s noobaa-core-0 1/1 Running 0 9m35s noobaa-db-pg-0 1/1 Running 0 7m57s noobaa-endpoint-6d85d6c867-hsp9c 1/1 Running 0 10m noobaa-operator-68cf54d6bd-fkjk9 1/1 Running 0 10m ocs-client-operator-console-7c4f6fbf7b-ltl9h 1/1 Running 0 14m ocs-client-operator-controller-manager-77cc58d696-cngkc 2/2 Running 0 14m ocs-metrics-exporter-759867f995-mg8cd 1/1 Running 0 10m ocs-operator-65c8959bd6-dgckd 1/1 Running 0 10m odf-console-89c85549-6qzc7 1/1 Running 0 14m odf-operator-controller-manager-7457bc49b4-mjgj2 2/2 Running 1 (11m ago) 14m rook-ceph-crashcollector-compute-0-5c9796dc6b-n9r4x 1/1 Running 0 6m41s rook-ceph-crashcollector-compute-1-7f84b499cf-g6c7r 1/1 Running 0 8m59s rook-ceph-crashcollector-compute-2-67f57795b8-4zttb 1/1 Running 0 7m28s rook-ceph-exporter-compute-0-6977c68fdb-54m5x 1/1 Running 0 6m38s rook-ceph-exporter-compute-1-b884fd99f-46vz2 1/1 Running 0 8m56s rook-ceph-exporter-compute-2-69c66bfbd4-rjbnn 1/1 Running 0 7m25s rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-5cb6bc65b8w7l 2/2 Running 0 4m13s rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-6d45659c62qvd 2/2 Running 0 3m50s rook-ceph-mgr-a-5ddf85b57c-vw7p9 3/3 Running 0 5m11s rook-ceph-mgr-b-6d4ff8c4d6-xqlq9 3/3 Running 0 4m46s rook-ceph-mon-a-84c996b649-w9pk5 2/2 Running 0 8m59s rook-ceph-mon-b-5fc77b4bb9-dfdfd 2/2 Running 0 7m28s rook-ceph-mon-c-7758b7f549-vv2mk 2/2 Running 0 5m53s rook-ceph-operator-5d9c549687-klrvt 1/1 Running 0 10m rook-ceph-osd-0-6b44f59964-2zdsp 2/2 Running 0 4m23s rook-ceph-osd-1-655896987-d9x94 2/2 Running 0 3m59s rook-ceph-osd-2-68f9d46844-d4prl 2/2 Running 0 3m35s rook-ceph-osd-prepare-0baa69d3403d1c2216d095eea4c2adcd-jvdvn 0/1 Completed 0 36m rook-ceph-osd-prepare-97aaf5506140f91891a468f818d3e57c-gvhtd 0/1 Completed 0 36m rook-ceph-osd-prepare-bb0571daa76f557be646c66568031c71-k7kvj 0/1 Completed 0 36m rook-ceph-rgw-ocs-storagecluster-cephobjectstore-a-7657bf7vp9kj 2/2 Running 0 6m41s rook-ceph-tools-59d6dcbd66-mvbqf 1/1 Running 0 10m ux-backend-server-5c67fc645-k9vm8 2/2 Running 0 10m 8.check storagecluster oviner~$ oc get storageclusters.ocs.openshift.io NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 40m Ready 2024-04-21T12:11:47Z 4.16.0 9.Check ceph status: sh-5.1$ ceph -s cluster: id: a2271230-7eb4-4459-91aa-911aa8a41dca health: HEALTH_OK For more info: https://docs.google.com/document/d/1HRmQwJ9Hz-lvKogNUFXKFtizXwM2dR5z8B2DRtOOopo/edit
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:4591