Description of problem (please be detailed as possible and provide log snippests): In a provider cluster with 4 storageconsumers(one is internal client), these are 6 subvolumegroup present. There is only one storageclassclaim present for sharedfilesystem on each client. From the provider cluster: $ oc get storageconsumers -A NAMESPACE NAME AGE openshift-storage storageconsumer-7e7dfdd4-b5ce-43f5-b7fd-8da1705d98de 17h openshift-storage storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce 17h openshift-storage storageconsumer-94605340-6888-48ec-8723-839d99fb2b2f 4d18h openshift-storage storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf 17h $ oc -n openshift-storage rsh rook-ceph-tools-57fd4d4d68-6qnls ceph fs subvolumegroup ls ocs-storagecluster-cephfilesystem [ { "name": "cephfilesystemsubvolumegroup-storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce-f4c1e396" }, { "name": "cephfilesystemsubvolumegroup-storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf-9e0c9f00" }, { "name": "cephfilesystemsubvolumegroup-storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf-18656957" }, { "name": "cephfilesystemsubvolumegroup-storageconsumer-94605340-6888-48ec-8723-839d99fb2b2f-9c95cd7e" }, { "name": "cephfilesystemsubvolumegroup-storageconsumer-7e7dfdd4-b5ce-43f5-b7fd-8da1705d98de-8fd98bbc" }, { "name": "cephfilesystemsubvolumegroup-storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce-46376034" } ] $ oc get storageclassrequests NAME STORAGETYPE PHASE storageclassrequest-0352064f51b00d0d75834cf583184cf8 sharedfilesystem Ready storageclassrequest-18ceea741c46ce1c4d2304f056207b23 blockpool Ready storageclassrequest-4bc9aaab70bc9ac351c20f8a20fddf89 blockpool Ready storageclassrequest-57f47e6a140d1417657d11a636047e4f sharedfilesystem Ready storageclassrequest-605635651329fdd54cd62d0032467121 sharedfilesystem Ready storageclassrequest-a04249eb146fe57e464080ca30fb3880 blockpool Ready storageclassrequest-c1dce6043ef967192d9245d87b2960e1 blockpool Ready storageclassrequest-c93fadac1eb99a0ba37b7f7dc7fded06 sharedfilesystem Ready This shows duplicate subvolumegroup for storageconsumers storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce and storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf. Internal client on the provider cluster: $ oc get storageclassclaims NAME STORAGETYPE STORAGEPROFILE STORAGECLIENTNAME STORAGECLIENTNAMESPACE PHASE ocs-storagecluster-ceph-rbd blockpool storage-client openshift-storage-client Ready ocs-storagecluster-cephfs sharedfilesystem storage-client openshift-storage-client Ready $ oc get storageclient -A NAMESPACE NAME PHASE CONSUMER openshift-storage-client storage-client Connected 39d46d35-a721-48a5-8cb4-bd9af22420e9 Client cluster 1: $ oc get storageclassclaims NAME STORAGETYPE STORAGEPROFILE STORAGECLIENTNAME STORAGECLIENTNAMESPACE PHASE ocs-storagecluster-ceph-rbd blockpool storage-client openshift-storage-client Ready ocs-storagecluster-cephfs sharedfilesystem storage-client openshift-storage-client Ready $ oc get storageclient -A NAMESPACE NAME PHASE CONSUMER openshift-storage-client storage-client Connected 9505e71c-c588-4894-b7b5-b3bd6d5b4f4b Client cluster 2: $ oc get storageclassclaims NAME STORAGETYPE STORAGEPROFILE STORAGECLIENTNAME STORAGECLIENTNAMESPACE PHASE ocs-storagecluster-ceph-rbd blockpool storage-client openshift-storage-client Ready ocs-storagecluster-cephfs sharedfilesystem storage-client openshift-storage-client Ready $ oc get storageclient -A NAMESPACE NAME PHASE CONSUMER openshift-storage-client storage-client Connected e7ab22c8-1fc2-42c5-a004-deb77769486a Client cluster 3: $ oc get storageclassclaims NAME STORAGETYPE STORAGEPROFILE STORAGECLIENTNAME STORAGECLIENTNAMESPACE PHASE ocs-storagecluster-ceph-rbd blockpool storage-client openshift-storage-client Ready ocs-storagecluster-cephfs sharedfilesystem storage-client openshift-storage-client Ready $ oc get storageclients -A NAMESPACE NAME PHASE CONSUMER openshift-storage-client storage-client Connected cf205732-e1e3-4ce4-a4d9-6c3be03949ad Must-gather logs from provider cluster collected using quay.io/rhceph-dev/ocs-must-gather:4.14-fusion-hci http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ibm-cloud-pv-cl/ibm-cloud-pv-cl_20240112T084405/logs/bug_2258801/ ===================================================================== Version of all relevant components (if applicable): Provider cluster: $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.14.4-5.fusion-hci NooBaa Operator 4.14.4-5.fusion-hci mcg-operator.v4.14.3-rhodf Succeeded metallb-operator.v4.14.0-202311302149 MetalLB Operator 4.14.0-202311302149 Succeeded ocs-operator.v4.14.4-5.fusion-hci OpenShift Container Storage 4.14.4-5.fusion-hci ocs-operator.v4.14.3-rhodf Succeeded odf-csi-addons-operator.v4.14.4-5.fusion-hci CSI Addons 4.14.4-5.fusion-hci odf-csi-addons-operator.v4.14.3-rhodf Succeeded odf-operator.v4.14.4-5.fusion-hci OpenShift Data Foundation 4.14.4-5.fusion-hci odf-operator.v4.14.3-rhodf Succeeded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.7 True False 5d23h Cluster version is 4.14.7 Client clusters: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.9 True False 17h Cluster version is 4.14.9 $ oc get csv -n openshift-storage-client NAME DISPLAY VERSION REPLACES PHASE ocs-client-operator.v4.14.4-4.fusion-hci OpenShift Data Foundation Client 4.14.4-4.fusion-hci ocs-client-operator.v4.14.3-rhodf Succeeded odf-csi-addons-operator.v4.14.4-4.fusion-hci CSI Addons 4.14.4-4.fusion-hci odf-csi-addons-operator.v4.14.3-rhodf Succeeded ====================================================================== Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Reporting the first noticed instance. Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create provider-client setup with 3 clients connected excluding the internal client. 2. Verify the list of subvolumegroups using the command ceph fs subvolumegroup ls ocs-storagecluster-cephfilesystem ==================================================================== Actual results: The number of subvolumegroup is more than required Expected results: The number of subvolumegroup should not duplicate Additional info:
The cephFilesystemSubVolumeGroup name is generated based on storageConsumerName and a UUID[1]. The name is stored in the status sub-resource of the storageClassRequest CR[2]. For the next reconciliation, we look into the status section for the name of the resource[3]. During the current reconciliation, if the status fails to update, then for the next reconciliation the status won't have the name generated in the previous reconciliation. From the logs, we see that the status failed to update. ``` 2024-01-17T11:52:34.839475541Z {"level":"info","ts":"2024-01-17T11:52:34Z","msg":"Failed to update StorageClassRequest status.","controller":"storageclassrequest","controllerGroup":"ocs.openshift.io","controllerKind":"StorageClassRequest","StorageClassRequest":{"name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","reconcileID":"58f7708c-8403-442f-95d0-47ba3b192cd7","StorageClassRequest":{"name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","namespace":"openshift-storage"}} 2024-01-17T11:52:34.839533073Z {"level":"error","ts":"2024-01-17T11:52:34Z","msg":"Reconciler error","controller":"storageclassrequest","controllerGroup":"ocs.openshift.io","controllerKind":"StorageClassRequest","StorageClassRequest":{"name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","reconcileID":"58f7708c-8403-442f-95d0-47ba3b192cd7","error":"Operation cannot be fulfilled on storageclassrequests.ocs.openshift.io \"storageclassrequest-0352064f51b00d0d75834cf583184cf8\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"} ``` [1]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L248-L251 [2.1]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L454 (Set the status on the instance) [2.2]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L120 (update the CR status) [3]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L239-L251
We are still looking for a RCA
Verified in version: Provider: $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.15.0-139.stable NooBaa Operator 4.15.0-139.stable mcg-operator.v4.15.0-136.stable Succeeded metallb-operator.v4.14.0-202401151553 MetalLB Operator 4.14.0-202401151553 Succeeded ocs-operator.v4.15.0-139.stable OpenShift Container Storage 4.15.0-139.stable ocs-operator.v4.15.0-136.stable Succeeded odf-csi-addons-operator.v4.15.0-139.stable CSI Addons 4.15.0-139.stable odf-csi-addons-operator.v4.15.0-136.stable Succeeded odf-operator.v4.15.0-139.stable OpenShift Data Foundation 4.15.0-139.stable odf-operator.v4.15.0-136.stable Succeeded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.0-0.nightly-2024-01-25-051548 True False 14d Cluster version is 4.15.0-0.nightly-2024-01-25-051548 Client on hosted cluster: $ oc get csv NAME DISPLAY VERSION REPLACES PHASE ocs-client-operator.v4.15.0-136.stable OpenShift Data Foundation Client 4.15.0-136.stable Succeeded odf-csi-addons-operator.v4.15.0-136.stable CSI Addons 4.15.0-136.stable Succeeded (The version of the client is not a dependency as confirmed by Leela) $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.11 True False 4d Cluster version is 4.14.11 Steps done: Created clients and cephfs storageclassclaims multiple times. Cephfs storageclassclaim creation and deletion was performed more than 15 times. Result: No duplicate subvolumegroup is created. Verifed using the command "ceph fs subvolumegroup ls ocs-storagecluster-cephfilesystem" and "oc get cephfilesystemsubvolumegroups". Note: The name format of subvolumegroup is changed to "cephfilesystemsubvolumegroup-a2f3fd317a034bbe5eeebb581732f288" from "cephfilesystemsubvolumegroup-storageconsumer-71eea242-935a-49f5-be03-084d2843a95e-3af5665d". $ oc get cephfilesystemsubvolumegroups cephfilesystemsubvolumegroup-a2f3fd317a034bbe5eeebb581732f288 -o yaml apiVersion: ceph.rook.io/v1 kind: CephFilesystemSubVolumeGroup metadata: creationTimestamp: "2024-02-12T16:08:35Z" finalizers: - cephfilesystemsubvolumegroup.ceph.rook.io generation: 1 labels: cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-ssd ocs.openshift.io/storageconsumer-name: storageconsumer-103a0498-0f0e-47e8-bacd-2e3e468dc1e4 ocs.openshift.io/storageprofile-spec: 676d735b95e2732afffc15162bb2c51d name: cephfilesystemsubvolumegroup-a2f3fd317a034bbe5eeebb581732f288 namespace: openshift-storage ownerReferences: - apiVersion: ocs.openshift.io/v1alpha1 kind: StorageClassRequest name: storageclassrequest-a0a855ef3c7462e16c8163b44dcaf864 uid: f296b008-14d5-4e05-8aa6-d220e7cce906 resourceVersion: "36958363" uid: 9f88aa60-7689-44e5-be24-1c7ad820139a spec: filesystemName: ocs-storagecluster-cephfilesystem pinning: {} status: info: clusterID: 55489fe972e7a8d63cbfeaec82520ab2 observedGeneration: 1 phase: Ready Tested on IBM Cloud BM platform.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383