Bug 2258937 - [Provider-Client] Additional subvolumegroups created on provider cluster
Summary: [Provider-Client] Additional subvolumegroups created on provider cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.14
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ODF 4.15.0
Assignee: Rewant
QA Contact: Jilju Joy
URL:
Whiteboard: isf-provider
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-01-18 04:32 UTC by Jilju Joy
Modified: 2024-03-19 15:31 UTC (History)
6 users (show)

Fixed In Version: 4.15.0-139
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:31:38 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage ocs-operator pull 2437 0 None open storageclassrequest: wait for cache to update before next reconcile 2024-02-02 03:29:14 UTC
Github red-hat-storage ocs-operator pull 2450 0 None open Bug 2258937: [release-4.15] storageClassRequest: move away from status as source of truth 2024-02-08 04:08:25 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:31:41 UTC

Description Jilju Joy 2024-01-18 04:32:27 UTC
Description of problem (please be detailed as possible and provide log
snippests):
In a provider cluster with 4 storageconsumers(one is internal client), these are 6 subvolumegroup present. There is only one storageclassclaim present for sharedfilesystem on each client.

From the provider cluster:

$ oc get storageconsumers -A
NAMESPACE           NAME                                                   AGE
openshift-storage   storageconsumer-7e7dfdd4-b5ce-43f5-b7fd-8da1705d98de   17h
openshift-storage   storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce   17h
openshift-storage   storageconsumer-94605340-6888-48ec-8723-839d99fb2b2f   4d18h
openshift-storage   storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf   17h


$ oc -n openshift-storage rsh rook-ceph-tools-57fd4d4d68-6qnls ceph fs subvolumegroup ls ocs-storagecluster-cephfilesystem
[
    {
        "name": "cephfilesystemsubvolumegroup-storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce-f4c1e396"
    },
    {
        "name": "cephfilesystemsubvolumegroup-storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf-9e0c9f00"
    },
    {
        "name": "cephfilesystemsubvolumegroup-storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf-18656957"
    },
    {
        "name": "cephfilesystemsubvolumegroup-storageconsumer-94605340-6888-48ec-8723-839d99fb2b2f-9c95cd7e"
    },
    {
        "name": "cephfilesystemsubvolumegroup-storageconsumer-7e7dfdd4-b5ce-43f5-b7fd-8da1705d98de-8fd98bbc"
    },
    {
        "name": "cephfilesystemsubvolumegroup-storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce-46376034"
    }
]


$ oc get storageclassrequests 
NAME                                                   STORAGETYPE        PHASE
storageclassrequest-0352064f51b00d0d75834cf583184cf8   sharedfilesystem   Ready
storageclassrequest-18ceea741c46ce1c4d2304f056207b23   blockpool          Ready
storageclassrequest-4bc9aaab70bc9ac351c20f8a20fddf89   blockpool          Ready
storageclassrequest-57f47e6a140d1417657d11a636047e4f   sharedfilesystem   Ready
storageclassrequest-605635651329fdd54cd62d0032467121   sharedfilesystem   Ready
storageclassrequest-a04249eb146fe57e464080ca30fb3880   blockpool          Ready
storageclassrequest-c1dce6043ef967192d9245d87b2960e1   blockpool          Ready
storageclassrequest-c93fadac1eb99a0ba37b7f7dc7fded06   sharedfilesystem   Ready

This shows duplicate subvolumegroup for storageconsumers storageconsumer-85380cff-c984-400c-a21a-24ff61fcb3ce and storageconsumer-b35bd6ec-6975-4370-92a8-92f1a4a97ddf.


Internal client on the provider cluster:

$ oc get storageclassclaims
NAME                          STORAGETYPE        STORAGEPROFILE   STORAGECLIENTNAME   STORAGECLIENTNAMESPACE     PHASE
ocs-storagecluster-ceph-rbd   blockpool                           storage-client      openshift-storage-client   Ready
ocs-storagecluster-cephfs     sharedfilesystem                    storage-client      openshift-storage-client   Ready

$ oc get storageclient -A
NAMESPACE                  NAME             PHASE       CONSUMER
openshift-storage-client   storage-client   Connected   39d46d35-a721-48a5-8cb4-bd9af22420e9


Client cluster 1:

$ oc get storageclassclaims
NAME                          STORAGETYPE        STORAGEPROFILE   STORAGECLIENTNAME   STORAGECLIENTNAMESPACE     PHASE
ocs-storagecluster-ceph-rbd   blockpool                           storage-client      openshift-storage-client   Ready
ocs-storagecluster-cephfs     sharedfilesystem                    storage-client      openshift-storage-client   Ready

$ oc get storageclient -A
NAMESPACE                  NAME             PHASE       CONSUMER
openshift-storage-client   storage-client   Connected   9505e71c-c588-4894-b7b5-b3bd6d5b4f4b

Client cluster 2:

$ oc get storageclassclaims
NAME                          STORAGETYPE        STORAGEPROFILE   STORAGECLIENTNAME   STORAGECLIENTNAMESPACE     PHASE
ocs-storagecluster-ceph-rbd   blockpool                           storage-client      openshift-storage-client   Ready
ocs-storagecluster-cephfs     sharedfilesystem                    storage-client      openshift-storage-client   Ready

$ oc get storageclient -A
NAMESPACE                  NAME             PHASE       CONSUMER
openshift-storage-client   storage-client   Connected   e7ab22c8-1fc2-42c5-a004-deb77769486a


Client cluster 3:

$ oc get storageclassclaims
NAME                          STORAGETYPE        STORAGEPROFILE   STORAGECLIENTNAME   STORAGECLIENTNAMESPACE     PHASE
ocs-storagecluster-ceph-rbd   blockpool                           storage-client      openshift-storage-client   Ready
ocs-storagecluster-cephfs     sharedfilesystem                    storage-client      openshift-storage-client   Ready

$ oc get storageclients -A
NAMESPACE                  NAME             PHASE       CONSUMER
openshift-storage-client   storage-client   Connected   cf205732-e1e3-4ce4-a4d9-6c3be03949ad


Must-gather logs from provider cluster collected using quay.io/rhceph-dev/ocs-must-gather:4.14-fusion-hci
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/ibm-cloud-pv-cl/ibm-cloud-pv-cl_20240112T084405/logs/bug_2258801/

=====================================================================
Version of all relevant components (if applicable):

Provider cluster:
$ oc get csv
NAME                                           DISPLAY                       VERSION               REPLACES                                PHASE
mcg-operator.v4.14.4-5.fusion-hci              NooBaa Operator               4.14.4-5.fusion-hci   mcg-operator.v4.14.3-rhodf              Succeeded
metallb-operator.v4.14.0-202311302149          MetalLB Operator              4.14.0-202311302149                                           Succeeded
ocs-operator.v4.14.4-5.fusion-hci              OpenShift Container Storage   4.14.4-5.fusion-hci   ocs-operator.v4.14.3-rhodf              Succeeded
odf-csi-addons-operator.v4.14.4-5.fusion-hci   CSI Addons                    4.14.4-5.fusion-hci   odf-csi-addons-operator.v4.14.3-rhodf   Succeeded
odf-operator.v4.14.4-5.fusion-hci              OpenShift Data Foundation     4.14.4-5.fusion-hci   odf-operator.v4.14.3-rhodf              Succeeded



$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.7    True        False         5d23h   Cluster version is 4.14.7


Client clusters:

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.9    True        False         17h     Cluster version is 4.14.9

$ oc get csv -n openshift-storage-client
NAME                                           DISPLAY                            VERSION               REPLACES                                PHASE
ocs-client-operator.v4.14.4-4.fusion-hci       OpenShift Data Foundation Client   4.14.4-4.fusion-hci   ocs-client-operator.v4.14.3-rhodf       Succeeded
odf-csi-addons-operator.v4.14.4-4.fusion-hci   CSI Addons                         4.14.4-4.fusion-hci   odf-csi-addons-operator.v4.14.3-rhodf   Succeeded


======================================================================
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Reporting the first noticed instance.

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create provider-client setup with 3 clients connected excluding the internal client.
2. Verify the list of subvolumegroups using the command
ceph fs subvolumegroup ls ocs-storagecluster-cephfilesystem

====================================================================

Actual results:
The number of subvolumegroup is more than required

Expected results:
The number of subvolumegroup should not duplicate

Additional info:

Comment 3 Rewant 2024-01-25 08:26:03 UTC
The cephFilesystemSubVolumeGroup name is generated based on storageConsumerName and a UUID[1]. The name is stored in the status sub-resource of the storageClassRequest CR[2]. For the next reconciliation, we look into the status section for the name of the resource[3].


During the current reconciliation, if the status fails to update, then for the next reconciliation the status won't have the name generated in the previous reconciliation.
From the logs, we see that the status failed to update.
```
2024-01-17T11:52:34.839475541Z {"level":"info","ts":"2024-01-17T11:52:34Z","msg":"Failed to update StorageClassRequest status.","controller":"storageclassrequest","controllerGroup":"ocs.openshift.io","controllerKind":"StorageClassRequest","StorageClassRequest":{"name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","reconcileID":"58f7708c-8403-442f-95d0-47ba3b192cd7","StorageClassRequest":{"name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","namespace":"openshift-storage"}}
2024-01-17T11:52:34.839533073Z {"level":"error","ts":"2024-01-17T11:52:34Z","msg":"Reconciler error","controller":"storageclassrequest","controllerGroup":"ocs.openshift.io","controllerKind":"StorageClassRequest","StorageClassRequest":{"name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","namespace":"openshift-storage"},"namespace":"openshift-storage","name":"storageclassrequest-0352064f51b00d0d75834cf583184cf8","reconcileID":"58f7708c-8403-442f-95d0-47ba3b192cd7","error":"Operation cannot be fulfilled on storageclassrequests.ocs.openshift.io \"storageclassrequest-0352064f51b00d0d75834cf583184cf8\": the object has been modified; please apply your changes to the latest version and try again","stacktrace":"sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:324\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:265\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:226"}
```

[1]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L248-L251
[2.1]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L454 (Set the status on the instance)
[2.2]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L120 (update the CR status)
[3]: https://github.com/red-hat-storage/ocs-operator/blob/main/controllers/storageclassrequest/storageclassrequest_controller.go#L239-L251

Comment 6 Rewant 2024-01-29 10:54:09 UTC
We are still looking for a RCA

Comment 8 Jilju Joy 2024-02-13 07:30:55 UTC
Verified in version:

Provider:
$ oc get csv
NAME                                         DISPLAY                       VERSION               REPLACES                                     PHASE
mcg-operator.v4.15.0-139.stable              NooBaa Operator               4.15.0-139.stable     mcg-operator.v4.15.0-136.stable              Succeeded
metallb-operator.v4.14.0-202401151553        MetalLB Operator              4.14.0-202401151553                                                Succeeded
ocs-operator.v4.15.0-139.stable              OpenShift Container Storage   4.15.0-139.stable     ocs-operator.v4.15.0-136.stable              Succeeded
odf-csi-addons-operator.v4.15.0-139.stable   CSI Addons                    4.15.0-139.stable     odf-csi-addons-operator.v4.15.0-136.stable   Succeeded
odf-operator.v4.15.0-139.stable              OpenShift Data Foundation     4.15.0-139.stable     odf-operator.v4.15.0-136.stable              Succeeded

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-01-25-051548   True        False         14d     Cluster version is 4.15.0-0.nightly-2024-01-25-051548


Client on hosted cluster:
$ oc get csv
NAME                                         DISPLAY                            VERSION             REPLACES   PHASE
ocs-client-operator.v4.15.0-136.stable       OpenShift Data Foundation Client   4.15.0-136.stable              Succeeded
odf-csi-addons-operator.v4.15.0-136.stable   CSI Addons                         4.15.0-136.stable              Succeeded
(The version of the client is not a dependency as confirmed by Leela)

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.11   True        False         4d      Cluster version is 4.14.11


Steps done:
Created clients and cephfs storageclassclaims multiple times.
Cephfs storageclassclaim creation and deletion was performed more than 15 times.

Result:
No duplicate subvolumegroup is created. Verifed using the command "ceph fs subvolumegroup ls ocs-storagecluster-cephfilesystem" and "oc get cephfilesystemsubvolumegroups".


Note: The name format of subvolumegroup is changed to "cephfilesystemsubvolumegroup-a2f3fd317a034bbe5eeebb581732f288" from "cephfilesystemsubvolumegroup-storageconsumer-71eea242-935a-49f5-be03-084d2843a95e-3af5665d".

$ oc get cephfilesystemsubvolumegroups cephfilesystemsubvolumegroup-a2f3fd317a034bbe5eeebb581732f288 -o yaml
apiVersion: ceph.rook.io/v1
kind: CephFilesystemSubVolumeGroup
metadata:
  creationTimestamp: "2024-02-12T16:08:35Z"
  finalizers:
  - cephfilesystemsubvolumegroup.ceph.rook.io
  generation: 1
  labels:
    cephfilesystem.datapool.name: ocs-storagecluster-cephfilesystem-ssd
    ocs.openshift.io/storageconsumer-name: storageconsumer-103a0498-0f0e-47e8-bacd-2e3e468dc1e4
    ocs.openshift.io/storageprofile-spec: 676d735b95e2732afffc15162bb2c51d
  name: cephfilesystemsubvolumegroup-a2f3fd317a034bbe5eeebb581732f288
  namespace: openshift-storage
  ownerReferences:
  - apiVersion: ocs.openshift.io/v1alpha1
    kind: StorageClassRequest
    name: storageclassrequest-a0a855ef3c7462e16c8163b44dcaf864
    uid: f296b008-14d5-4e05-8aa6-d220e7cce906
  resourceVersion: "36958363"
  uid: 9f88aa60-7689-44e5-be24-1c7ad820139a
spec:
  filesystemName: ocs-storagecluster-cephfilesystem
  pinning: {}
status:
  info:
    clusterID: 55489fe972e7a8d63cbfeaec82520ab2
  observedGeneration: 1
  phase: Ready


Tested on IBM Cloud BM platform.

Comment 11 errata-xmlrpc 2024-03-19 15:31:38 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.