Pls refer https://github.com/rook/rook/issues/13942 for more details Fixed by https://github.com/rook/rook/pull/13966 (implemented in 4.16) Need back port for 4.15. Since we aren't aiming to back port all content that surround this fix, the testing is as below 1. This BZ effects only provider-client deployments, specifically client is also running in same cluster as provider, for everything else only regression is enough 2. For effected deployment w/ this fix a. restart ocs-client-operator-controller-manager-* pod in client operator namespace b. wait/verify the presence of "openshift-storage" prefixed resources in "oc get csidriver" c. add "ROOK_CSI_DISABLE_DRIVER: true" in "rook-ceph-operator-config" cm in "openshift-storage" ns d. restart "rook-ceph-operator" pod and "oc get csidriver" should still list "openshift-storage" prefixed resources
Parth, when can we have the PR ready?
Tested in version: % oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.15.3-rhodf NooBaa Operator 4.15.3-rhodf mcg-operator.v4.15.2-rhodf Succeeded ocs-operator.v4.15.3-rhodf OpenShift Container Storage 4.15.3-rhodf ocs-operator.v4.15.2-rhodf Succeeded odf-csi-addons-operator.v4.15.3-rhodf CSI Addons 4.15.3-rhodf odf-csi-addons-operator.v4.15.2-rhodf Succeeded odf-operator.v4.15.3-rhodf OpenShift Data Foundation 4.15.3-rhodf odf-operator.v4.15.2-rhodf Succeeded % oc get csv odf-operator.v4.15.3-rhodf -n openshift-storage -o yaml | grep full_version full_version: 4.15.3-4 % oc get csv -n openshift-storage-client NAME DISPLAY VERSION REPLACES PHASE ocs-client-operator.v4.15.3-rhodf OpenShift Data Foundation Client 4.15.3-rhodf ocs-client-operator.v4.14.6-rhodf Succeeded odf-csi-addons-operator.v4.15.3-rhodf CSI Addons 4.15.3-rhodf odf-csi-addons-operator.v4.14.6-rhodf Succeeded Tested in BM like vmware setup. Discussed with Leela and Parth. The new parameter ROOK_CSI_DISABLE_DRIVER: "true" should be added manually in 4.15. This is added manually in the configmap rook-ceph-operator-config for testing. % oc get cm rook-ceph-operator-config -o yaml | yq -r .data ROOK_CSI_DISABLE_DRIVER: "true" ROOK_CSI_ENABLE_CEPHFS: "false" ROOK_CSI_ENABLE_RBD: "false" % oc get csidriver NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGE openshift-storage.cephfs.csi.ceph.com true false false <unset> false Persistent 43s openshift-storage.rbd.csi.ceph.com true false false <unset> false Persistent 43s % oc get po -l app=rook-ceph-operator NAME READY STATUS RESTARTS AGE rook-ceph-operator-66c5f95bd9-9gd79 1/1 Running 0 33m Restart rook-ceph-operator. % oc rollout restart deploy/rook-ceph-operator deployment.apps/rook-ceph-operator restarted % oc get po -l app=rook-ceph-operator NAME READY STATUS RESTARTS AGE rook-ceph-operator-8557fd5754-gv8vp 1/1 Running 0 21s csidrivers was not deleted % oc get csidriver NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGE openshift-storage.cephfs.csi.ceph.com true false false <unset> false Persistent 6m58s openshift-storage.rbd.csi.ceph.com true false false <unset> false Persistent 6m58s Test by deleting the pod rook-ceph-operator. % oc delete pod rook-ceph-operator-8557fd5754-gv8vp pod "rook-ceph-operator-8557fd5754-gv8vp" deleted % oc get po -l app=rook-ceph-operator NAME READY STATUS RESTARTS AGE rook-ceph-operator-8557fd5754-rfwnr 1/1 Running 0 66s csidrivers are not deleted. % oc get csidriver NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGE openshift-storage.cephfs.csi.ceph.com true false false <unset> false Persistent 12m openshift-storage.rbd.csi.ceph.com true false false <unset> false Persistent 12m Did PVC, pods creation and I/O. This is working as expected. As an additional step to check the impact of "ocs-client-operator-controller-manager" pod deletion on csidriver. % oc delete pod ocs-client-operator-controller-manager-5cbf779845-7mx9h -n openshift-storage-client pod "ocs-client-operator-controller-manager-5cbf779845-7mx9h" deleted % oc get pods -n openshift-storage-client | grep ocs-client-operator-controller-manager ocs-client-operator-controller-manager-5cbf779845-jqq6c 2/2 Running 0 18s csidrivers re-created (AGE is 5s). % oc get csidriver NAME ATTACHREQUIRED PODINFOONMOUNT STORAGECAPACITY TOKENREQUESTS REQUIRESREPUBLISH MODES AGE openshift-storage.cephfs.csi.ceph.com true false false <unset> false Persistent 5s openshift-storage.rbd.csi.ceph.com true false false <unset> false Persistent 5s Hi Parth, Is the recreation of csidriver expected when ocs-client-operator-controller-manager pod is deleted ? This is when the csidrivers are already present.
Jilju, they only get recreated when the spec differs, maybe it's better to post the spec before and after rebooting ocs-client-op? Nevertheless, this isn't directly related to the issue that this BZ fixes and so I suggest to note observations and raise a new bug if required. Even if they are getting recreated with same content it will not create any issue. thanks.
based on comments #13 and #15 moving this bz to Verified
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenShift Data Foundation 4.15.3 Bug Fix Update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2024:3806