Bug 2259209 - [GSS] Rook-Ceph orperator deployment check/fail if 2 StorageClassDeviceSets are deployed but name: is not unique
Summary: [GSS] Rook-Ceph orperator deployment check/fail if 2 StorageClassDeviceSets ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: unspecified
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ODF 4.16.0
Assignee: Travis Nielsen
QA Contact: Nagendra Reddy
URL:
Whiteboard:
Depends On:
Blocks: 2260844
TreeView+ depends on / blocked
 
Reported: 2024-01-19 14:52 UTC by khover
Modified: 2024-07-30 02:53 UTC (History)
9 users (show)

Fixed In Version: 4.16.0-86
Doc Type: Bug Fix
Doc Text:
.Rook-Ceph operator deployment fail when storage class device sets are deployed with duplicate names Previously, when StorageClassDeviceSets were added into the StorageCluster CR with duplicate names, the OSDs failed leaving Rook confused about the OSD configuration. With this fix, if the duplicate device set names are found in the CR, Rook refuses to reconcile the OSDs until it is fixed. An error is seen in the rook operator log about failing to reconcile the OSDs.
Clone Of:
Environment:
Last Closed: 2024-07-17 13:12:19 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github rook rook pull 14002 0 None open osd: Prevent osd reconcile when device set names duplicated 2024-04-01 22:41:24 UTC
Red Hat Product Errata RHSA-2024:4591 0 None None None 2024-07-17 13:12:32 UTC

Description khover 2024-01-19 14:52:56 UTC
Description of problem (please be detailed as possible and provide log
snippests):

It is possible to deploy a cluster with 2 storageclasses under StorageClassDeviceSets with a unique id. However if the name: value is not unique this creates a deviceset indexing issue.


Version of all relevant components (if applicable):

tested in 4.12 


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

Yes, this creates a indexing issue where Rook does not know which deviceset it has or has not completed tasks on during reconcile.

Is there any workaround available to the best of your knowledge?

If the name: is changed for 1 of the 2 storageclasses a new set of osds will be created with the unique id. Allowing rook to then upgrade images, replace osds. 

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

4

Can this issue reproducible?

Yes via yaml deploy or Helm chart.

Can this issue reproduce from the UI?

Not tested 

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1.
2.
3.


Actual results:

Rook deploys osds on install with a deviceset name schema that will block future actions ( Upgrades etc )


Expected results:

Rook fails on deployment when checking for dependancys and fails on name: not unique.


Additional info:

Comment 3 khover 2024-01-19 15:05:26 UTC
Testing done in lab env.

Storagecluster CR

   storageDeviceSets:
    - config: {}
      count: 1
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 20Gi
          storageClassName: datastore-1
          volumeMode: Block
        status: {}
      name: ocs-deviceset
      placement: {}
      portable: true
      preparePlacement: {}
      replica: 3
      resources: {}
    - config: {}
      count: 1
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: 20Gi
          storageClassName: datastore-2
          volumeMode: Block
        status: {}
      name: ocs-deviceset
      placement: {}
      portable: true
      preparePlacement: {}
      replica: 3

sh-5.1$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                         STATUS  REWEIGHT  PRI-AFF
 -1         0.11691  root default                                                               
 -6         0.11691      region us-east-2                                                       
 -5         0.11691          zone us-east-2b                                                    
-14         0.03897              rack rack0                                                     
-13         0.01949                  host ocs-deviceset-2-data-06f7tx                           
  3    ssd  0.01949                      osd.3                             up   1.00000  1.00000
-17         0.01949                  host ocs-deviceset-2-data-0m7bnw                           
  2    ssd  0.01949                      osd.2                             up   1.00000  1.00000
-20         0.03897              rack rack1                                                     
-19         0.01949                  host ocs-deviceset-1-data-05rmsp                           
  5    ssd  0.01949                      osd.5                             up   1.00000  1.00000
-23         0.01949                  host ocs-deviceset-1-data-0lpzb2                           
  4    ssd  0.01949                      osd.4                             up   1.00000  1.00000
 -4         0.03897              rack rack2                                                     
-11         0.01949                  host ocs-deviceset-0-data-02nrqq                           
  0    ssd  0.01949                      osd.0                             up   1.00000  1.00000
 -3         0.01949                  host ocs-deviceset-0-data-09tncp                           
  1    ssd  0.01949                      osd.1                             up   1.00000  1.00000

Post upgrade 4.12.> 4.13

sh-5.1$ ceph versions
{
    "mon": {
        "ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)": 3
    },
    "mgr": {
        "ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)": 1
    },
    "osd": {
        "ceph version 16.2.10-208.el8cp (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 3,
        "ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)": 3
    },
    "mds": {
        "ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)": 2
    },
    "overall": {
        "ceph version 16.2.10-208.el8cp (791f73fbb4bbca2ffe53a2ea0f8706dbffadcc0b) pacific (stable)": 3,
        "ceph version 17.2.6-170.el9cp (59bbeb8815ec3aeb3c8bba1e1866f8f6729eb840) quincy (stable)": 9


Rook operator logs:

2024-01-13 19:17:06.736036 E | op-osd: failed to update OSD 5: failed to generate config for OSD 5 on PVC "ocs-deviceset-1-data-05rmsp": failed to find valid VolumeSource for PVC "ocs-deviceset-1-data-05rmsp"
2024-01-13 19:17:06.769643 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: 4 failures encountered while running osds on nodes in namespace "openshift-storage".
failed to update OSD 0: failed to generate config for OSD 0 on PVC "ocs-deviceset-0-data-02nrqq": failed to find valid VolumeSource for PVC "ocs-deviceset-0-data-02nrqq"
failed to update OSD 2: failed to generate config for OSD 2 on PVC "ocs-deviceset-2-data-0m7bnw": failed to find valid VolumeSource for PVC "ocs-deviceset-2-data-0m7bnw"
failed to update OSD 3: failed to generate config for OSD 3 on PVC "ocs-deviceset-2-data-0cv4sj": failed to find valid VolumeSource for PVC "ocs-deviceset-2-data-0cv4sj"
failed to update OSD 5: failed to generate config for OSD 5 on PVC "ocs-deviceset-1-data-05rmsp": failed to find valid VolumeSource for PVC "ocs-deviceset-1-data-05rmsp"

With odf-backend-storage-1 odf-backend-storage-2 both having ocs-deviceset

Rook has no way to index device sets between sc

This is the root cause of the behavior 

1) >  3 osds getting updated instead of 6

2) > op-osd: failed to update OSD 5: failed to generate config for OSD 5 on PVC "ocs-deviceset-1-data-05rmsp": failed to find valid VolumeSource for PVC "ocs-deviceset-1-data-05rmsp"

3) replacing osds > rook operator >  no nodes are defined for configuring OSDs on raw devices

Comment 4 khover 2024-02-14 17:53:38 UTC
Hello All,

I opened this almost a month ago, is there anything additional needed ?

Comment 5 Santosh Pillai 2024-02-15 04:51:08 UTC
Rook currently supports only unique names for the storageClassDeviceSets. (https://github.com/rook/rook/blob/master/design/ceph/storage-class-device-set.md)

Is the requirement to support same names across multiple StorageClassDeviceSets or prevent cluster creation when names are not unique?

Since this is an RFE, a JIRA would be required so that the PM can prioritize it.

Comment 6 khover 2024-02-19 13:43:23 UTC
The requirement is prevent cluster creation when names are not unique.

Jira link is here.

https://issues.redhat.com/browse/OCSQECL-3133

Comment 9 Travis Nielsen 2024-04-24 13:13:04 UTC
Removed RFE from the title since it's just a very simple error check for this corner case.

Comment 14 Nagendra Reddy 2024-05-10 09:20:40 UTC
@Travis,

Verified fix with below versions. 
odf 4.16.0-96.stable
ocp 4.16.0-0.nightly-2024-05-07-025557

Looks like fix is working as expected. I could see if the duplicate device set names are found in the CR, Rook refused to reconcile the OSDs until it is fixed. But even after fixing it [by providing non-unique names] the OSDs didn't come up properly. Please look into below observation and confirm the behaviour. 

With single StorageClassDeviceSet

sh-5.1$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                     STATUS  REWEIGHT  PRI-AFF
 -1         1.50000  root default
 -5         1.50000      region us-south
-10         0.50000          zone us-south-1
 -9         0.50000              host ocs-deviceset-2-data-0nhk4c
  1    ssd  0.50000                  osd.1                             up   1.00000  1.00000
-14         0.50000          zone us-south-2
-13         0.50000              host ocs-deviceset-0-data-0p5tmv
  2    ssd  0.50000                  osd.2                             up   1.00000  1.00000
 -4         0.50000          zone us-south-3
 -3         0.50000              host ocs-deviceset-1-data-0248h2
  0    ssd  0.50000                  osd.0                             up   1.00000  1.00000
sh-5.1$ ceph osd status
ID  HOST                                   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  nagreddy-m9-ibc-8wp9s-worker-3-lfbqq  8930M   503G      0        0       0        0   exists,up
 1  nagreddy-m9-ibc-8wp9s-worker-1-czkzk  8926M   503G      0     4096       0        0   exists,up
 2  nagreddy-m9-ibc-8wp9s-worker-2-cxfw9  8926M   503G      0     2457       2      106   exists,up

======================================================================================================================================================================

Two storageDeviceSets with unique name and different storageClass.

  storageDeviceSets:
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 512Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 512Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      status: {}
    name: ocs-deviceset
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}



2024-05-10 07:49:48.235633 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: device set "ocs-deviceset-0" name is duplicated, OSDs cannot be configured

sh-5.1$ ceph osd status
ID  HOST   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0           0      0       0        0       0        0   exists,up
 1           0      0       0        0       0        0   exists,up
 2           0      0       0        0       0        0   exists,up


===============================================================================================================================================================================================
Two storageDeviceSets with non-unique name and different storageClass

 storageDeviceSets:
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 512Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 512Gi
        storageClassName: ocs-storagecluster-ceph-rbd
        volumeMode: Block
      status: {}
    name: ocs-deviceset-2
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}



2024-05-10 07:55:00.462217 E | clusterdisruption-controller: failed to update configMap "rook-ceph-pdbstatemap" in cluster "openshift-storage/ocs-storagecluster-cephcluster": Operation cannot be fulfilled on configmaps "rook-ceph-pdbstatemap": the object has been modified; please apply your changes to the latest version and try again

2024-05-10 07:55:02.577743 E | ceph-spec: failed to update cluster condition to {Type:Ready Status:True Reason:ClusterCreated Message:Cluster created successfully LastHeartbeatTime:2024-05-10 07:55:02.538013049 +0000 UTC m=+65592.624616581 LastTransitionTime:2024-05-09 13:46:54 +0000 UTC}. failed to update object "openshift-storage/ocs-storagecluster-cephcluster" status: Operation cannot be fulfilled on cephclusters.ceph.rook.io "ocs-storagecluster-cephcluster": the object has been modified; please apply your changes to the latest version and try again


rook-ceph-osd-0-686dd5bb7d-swm66                                  2/2     Running     0              18h
rook-ceph-osd-1-cf4b967b-v578n                                    2/2     Running     0              18h
rook-ceph-osd-2-6d8ddb6967-cp92c                                  2/2     Running     0              18h
rook-ceph-osd-3-59b8df9b95-mzkrf                                  1/2     Running     0              4m45s
rook-ceph-osd-4-5b44997949-zp7b8                                  1/2     Running     0              4m45s
rook-ceph-osd-5-8bb995cc5-c75mg                                   2/2     Running     0              4m45s




ID  HOST                                   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  nagreddy-m9-ibc-8wp9s-worker-3-lfbqq  9382M   502G      0        0       0        0   exists,up
 1  nagreddy-m9-ibc-8wp9s-worker-1-czkzk  9394M   502G      0        0       0        0   exists,up
 2  nagreddy-m9-ibc-8wp9s-worker-2-cxfw9  9394M   502G      0        0       2      106   exists,up
 3                                           0      0       0        0       0        0   exists,new
 4                                           0      0       0        0       0        0   exists,new
 5  nagreddy-m9-ibc-8wp9s-worker-2-cxfw9  27.9M   511G      0        0       0        0   exists,up



sh-5.1$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                       STATUS  REWEIGHT  PRI-AFF
 -1         2.50000  root default
 -5         2.50000      region us-south
-10         0.50000          zone us-south-1
 -9         0.50000              host ocs-deviceset-2-data-0nhk4c
  1    ssd  0.50000                  osd.1                               up   1.00000  1.00000
-14         1.00000          zone us-south-2
-13         0.50000              host ocs-deviceset-0-data-0p5tmv
  2    ssd  0.50000                  osd.2                               up   1.00000  1.00000
-17         0.50000              host ocs-deviceset-2-2-data-0bvm4m
  5    ssd  0.50000                  osd.5                               up   1.00000  1.00000
 -4         1.00000          zone us-south-3
 -3         0.50000              host ocs-deviceset-1-data-0248h2
  0    ssd  0.50000                  osd.0                               up   1.00000  1.00000
-19         0.50000              host ocs-deviceset-2-1-data-02qqmr
  3    ssd  0.50000                  osd.3                             down   1.00000  1.00000
  4    ssd        0  osd.4                                             down   1.00000  1.00000


===================================================================================================================

Same storageCalssName and different storageDeviceSets names

  storageDeviceSets:
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 512Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 512Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset-3
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}


2024-05-10 08:04:46.739093 E | ceph-block-pool-controller: failed to reconcile CephBlockPool "openshift-storage/ocs-storagecluster-cephblockpool". failed to create pool "ocs-storagecluster-cephblockpool".: failed to create pool "ocs-storagecluster-cephblockpool".: failed to initialize pool "ocs-storagecluster-cephblockpool" for RBD use. : signal: interrupt

2024-05-10 08:05:04.314141 E | ceph-block-pool-controller: failed to reconcile CephBlockPool "openshift-storage/ocs-storagecluster-cephblockpool". failed to create pool "ocs-storagecluster-cephblockpool".: failed to create pool "ocs-storagecluster-cephblockpool".: failed to initialize pool "ocs-storagecluster-cephblockpool" for RBD use. : signal: interrupt

2024-05-10 08:08:26.308672 E | ceph-block-pool-controller: failed to reconcile CephBlockPool "openshift-storage/ocs-storagecluster-cephblockpool". failed to create pool "ocs-storagecluster-cephblockpool".: failed to create pool "ocs-storagecluster-cephblockpool".: failed to initialize pool "ocs-storagecluster-cephblockpool" for RBD use. : signal: interrupt

oc rsh rook-ceph-tools-74fcb56dbc-h2kjf
sh-5.1$ ceph osd tree
ID   CLASS  WEIGHT   TYPE NAME                                       STATUS  REWEIGHT  PRI-AFF
 -1         2.50000  root default
 -5         2.50000      region us-south
-10         0.50000          zone us-south-1
 -9         0.50000              host ocs-deviceset-2-data-0nhk4c
  1    ssd  0.50000                  osd.1                               up   1.00000  1.00000
-14         1.00000          zone us-south-2
-13         0.50000              host ocs-deviceset-0-data-0p5tmv
  2    ssd  0.50000                  osd.2                               up   1.00000  1.00000
-17         0.50000              host ocs-deviceset-2-2-data-0bvm4m
  5    ssd  0.50000                  osd.5                               up   1.00000  1.00000
 -4         1.00000          zone us-south-3
 -3         0.50000              host ocs-deviceset-1-data-0248h2
  0    ssd  0.50000                  osd.0                               up   1.00000  1.00000
-19         0.50000              host ocs-deviceset-2-1-data-02qqmr
  3    ssd  0.50000                  osd.3                             down         0  1.00000
  4    ssd        0  osd.4                                             down         0  1.00000

sh-5.1$ ceph osd status
ID  HOST                                   USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  nagreddy-m9-ibc-8wp9s-worker-3-lfbqq  9383M   502G      0        0       0        0   exists,up
 1  nagreddy-m9-ibc-8wp9s-worker-1-czkzk  9395M   502G      0        0       0        0   exists,up
 2  nagreddy-m9-ibc-8wp9s-worker-2-cxfw9  9395M   502G      0        0       0        0   exists,up
 3                                           0      0       0        0       0        0   autoout,exists,new
 4                                           0      0       0        0       0        0   autoout,exists,new
 5  nagreddy-m9-ibc-8wp9s-worker-2-cxfw9  28.5M   511G      0        0       0        0   exists,up


@Travis, 

Please let me know if there is any regression issue here to bring up the OSDs to normal state when non-unique names passed.

Comment 16 Travis Nielsen 2024-05-10 23:03:24 UTC
The CephCluster is only expecting to create 3 OSDs. There are 3 storageClassDeviceSets defined in the CephCluster CR, each with a count of 1.
We see 3 OSDs are successfully running in the cluster. It's the additional OSDs that are causing errors after the incorrect device set name was reverted.


The rook operator log shows these errors:

2024-05-10T09:23:15.822202502Z 2024-05-10 09:23:15.822114 E | op-osd: failed to update OSD 3: failed to generate config for OSD 3 on PVC "ocs-deviceset-2-1-data-02qqmr": failed to find valid VolumeSource for PVC "ocs-deviceset-2-1-data-02qqmr"
2024-05-10T09:23:17.512048217Z 2024-05-10 09:23:17.511905 I | op-osd: updating OSD 4 on PVC "ocs-deviceset-2-0-data-0cwgmn"
2024-05-10T09:23:17.557246989Z 2024-05-10 09:23:17.557176 E | op-osd: failed to update OSD 4: failed to generate config for OSD 4 on PVC "ocs-deviceset-2-0-data-0cwgmn": failed to find valid VolumeSource for PVC "ocs-deviceset-2-0-data-0cwgmn"
2024-05-10T09:23:19.523639830Z 2024-05-10 09:23:19.523577 I | op-osd: updating OSD 5 on PVC "ocs-deviceset-2-2-data-0bvm4m"
2024-05-10T09:23:19.570317301Z 2024-05-10 09:23:19.570249 E | op-osd: failed to update OSD 5: failed to generate config for OSD 5 on PVC "ocs-deviceset-2-2-data-0bvm4m": failed to find valid VolumeSource for PVC "ocs-deviceset-2-2-data-0bvm4m"

The extra OSD PVCs need to be deleted manually in this scenario. Rook does not ever automatically delete OSDs automatically. 

Did you add a different device set name "ocs-deviceset-2", and later remove that device set? I don't see that device set anymore in the cephcluster CR. The removal of a device set will leave OSD artifacts like this. So I expect this is a different scenario from just adding a duplicated device set name and then removing the duplicate. There is a different device set causing the problem when it was removed.

Comment 17 Nagendra Reddy 2024-05-14 07:41:19 UTC
verified with odf 4.16.0-96.stable

Fix is working as per expectations. 

When duplicate name is given the reconcile is failing until it is fixed.

2024-05-14 07:13:04.154364 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: device set "ocs-deviceset-0" name is duplicated, OSDs cannot be configured


storageDeviceSets:
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}

------------------------------

Working fine with non-unique names. Added 3 OSDs successfully.

 storageDeviceSets:
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}
  - config: {}
    count: 1
    dataPVCTemplate:
      metadata: {}
      spec:
        accessModes:
        - ReadWriteOnce
        resources:
          requests:
            storage: 100Gi
        storageClassName: ibmc-vpc-block-10iops-tier
        volumeMode: Block
      status: {}
    name: ocs-deviceset-1
    placement: {}
    portable: true
    preparePlacement: {}
    replica: 3
    resources: {}

ID  HOST                               USED  AVAIL  WR OPS  WR DATA  RD OPS  RD DATA  STATE
 0  tdesala-m13-kgxnz-worker-3-gbq6f  14.2G  85.7G      0        0       0        0   exists,up
 1  tdesala-m13-kgxnz-worker-2-ltm7h  13.7G  86.2G      0      118k      1       16   exists,up
 2  tdesala-m13-kgxnz-worker-1-rjxwv  15.4G  84.5G      0     99.1k      0        0   exists,up
 3  tdesala-m13-kgxnz-worker-1-rjxwv  8523M  91.6G      0     10.3k      0        0   exists,up
 4  tdesala-m13-kgxnz-worker-3-gbq6f  9678M  90.5G      0     4096       0        0   exists,up
 5  tdesala-m13-kgxnz-worker-2-ltm7h  10.1G  89.8G      0     4096       1       90   exists,up



@Travis,

I have given duplicate deviceset names like "ocs-deviceset". Why rook-operator is showing "ocs-deviceset-0" in the logs below? I expect it should show the device set name as "ocs-deviceset". What do you think?

2024-05-14 07:13:04.154364 E | ceph-cluster-controller: failed to reconcile CephCluster "openshift-storage/ocs-storagecluster-cephcluster". failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to create cluster: failed to start ceph osds: device set "ocs-deviceset-0" name is duplicated, OSDs cannot be configured

Comment 18 Travis Nielsen 2024-05-14 16:49:06 UTC
Please check the names of the storageClassDeviceSets in the CephCluster CR, which will match the issue that the Rook operator is reporting. The OCS operator generates those device set names based on the device set names in the storageDeviceSets from the StorageCluster CR. So Rook seems the name with the -0 suffix.

Comment 19 Nagendra Reddy 2024-05-15 06:24:03 UTC
Based on test results in comment17, confirming that the fix is working as per expectations. Hence, marking this BZ as Verified.

Comment 21 errata-xmlrpc 2024-07-17 13:12:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.16.0 security, enhancement & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:4591

Comment 22 Gribladeen 2024-07-30 02:53:15 UTC
(In reply to errata-xmlrpc from comment #21)
> Since the problem described in this bug report should be
> resolved in a recent advisory, it has been closed with a
> resolution of ERRATA.
> 
> For information on the advisory (Important: Red Hat OpenShift Data
> Foundation 4.16.0 security, enhancement & bug fix update), and where to find
> the updated
> files, follow the link below.
> 
> If the solution does not work for you, open a new bug report.
> 
> https://access.redhat.com/errata/RHSA-2024:4591 https://spacewaves.io

Thank bro.


Note You need to log in before you can comment on or make changes to this bug.