Bug 2260050 - Enabling Replica-1 from UI is not working on LSO backed ODF on IBM Power cluster
Summary: Enabling Replica-1 from UI is not working on LSO backed ODF on IBM Power cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: management-console
Version: 4.15
Hardware: ppc64le
OS: Linux
unspecified
unspecified
Target Milestone: ---
: ODF 4.15.0
Assignee: Bipul Adhikari
QA Contact: Aaruni Aggarwal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-01-24 09:12 UTC by Aaruni Aggarwal
Modified: 2024-03-19 15:32 UTC (History)
6 users (show)

Fixed In Version: 4.15.0-136
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:32:14 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage odf-console pull 1213 0 None open [release-4.15-compatibility] Bug 2260050: Remove support for Single Replica pool from wizard flow 2024-02-09 02:59:07 UTC
Github red-hat-storage odf-console pull 1214 0 None open [release-4.15] Bug 2260050: Remove support for Single Replica pool from wizard flow 2024-02-09 03:43:12 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:32:18 UTC

Description Aaruni Aggarwal 2024-01-24 09:12:58 UTC
Description of problem (please be detailed as possible and provide log
snippests):

Enabling Replica-1 from UI is not working on LSO backed ODF on IBM Power(ppc64le), although setup is having 2 disks per worker node. 


Version of all relevant components (if applicable):

OCP: 4.15.0
ODF: 4.15.0-123

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create an OCP4.15 cluster having 3 worker nodes. Install ODF4.15 operator
2. Install LSO4.15 operator. 
3. Create localvolume with 2 disks per worker node. This will create 6PVs.
4. Create Storagesystem from UI by enabling replica-1 pool from UI itself. 


Actual results:
replica-1 pool is not working because all the 6PVs are getting consumed by OSDs and not by replica-1. 

Expected results:
replica-1 pool should work. 

Additional info:

Comment 2 Aaruni Aggarwal 2024-01-24 09:19:11 UTC
CSV:

[root@rdr-replicaui-bastion-0 ~]# oc get csv -A
NAMESPACE                              NAME                                          DISPLAY                       VERSION               REPLACES   PHASE
openshift-local-storage                local-storage-operator.v4.15.0-202311280332   Local Storage                 4.15.0-202311280332              Succeeded
openshift-operator-lifecycle-manager   packageserver                                 Package Server                0.0.1-snapshot                   Succeeded
openshift-storage                      mcg-operator.v4.15.0-123.stable               NooBaa Operator               4.15.0-123.stable                Succeeded
openshift-storage                      ocs-operator.v4.15.0-123.stable               OpenShift Container Storage   4.15.0-123.stable                Succeeded
openshift-storage                      odf-csi-addons-operator.v4.15.0-123.stable    CSI Addons                    4.15.0-123.stable                Succeeded
openshift-storage                      odf-operator.v4.15.0-123.stable               OpenShift Data Foundation     4.15.0-123.stable                Succeeded
[root@rdr-replicaui-bastion-0 ~]#
 

pods: 

[root@rdr-replicaui-bastion-0 ~]# oc get pods -n openshift-storage
NAME                                                           READY   STATUS      RESTARTS        AGE
csi-addons-controller-manager-7485d8fdbf-vsp52                 2/2     Running     0               17m
csi-cephfsplugin-provisioner-9dd5ff5b-cvwfc                    6/6     Running     0               4m47s
csi-cephfsplugin-provisioner-9dd5ff5b-tm7l6                    6/6     Running     2 (4m9s ago)    4m47s
csi-cephfsplugin-rz7t8                                         2/2     Running     1 (4m10s ago)   4m47s
csi-cephfsplugin-s85c8                                         2/2     Running     0               4m47s
csi-cephfsplugin-t47l9                                         2/2     Running     1 (4m15s ago)   4m47s
csi-rbdplugin-gwswd                                            3/3     Running     0               4m47s
csi-rbdplugin-gzq9h                                            3/3     Running     1 (4m10s ago)   4m47s
csi-rbdplugin-provisioner-6dbfb56bbf-9jpjk                     6/6     Running     0               4m47s
csi-rbdplugin-provisioner-6dbfb56bbf-n9qm9                     6/6     Running     1 (4m14s ago)   4m47s
csi-rbdplugin-tnzsc                                            3/3     Running     1 (4m16s ago)   4m47s
noobaa-operator-77bc79475b-56rl2                               2/2     Running     0               17m
ocs-operator-5c5657798d-5fp5t                                  1/1     Running     0               17m
odf-console-9848c5b76-lpz54                                    1/1     Running     0               17m
odf-operator-controller-manager-55b9cbb9c5-dgz98               2/2     Running     0               17m
rook-ceph-crashcollector-worker-0-88878b9c4-dvcfp              1/1     Running     0               3m30s
rook-ceph-crashcollector-worker-1-657c67f5df-v7qv6             1/1     Running     0               3m6s
rook-ceph-crashcollector-worker-2-75b7c79bd8-p84mp             1/1     Running     0               3m9s
rook-ceph-exporter-worker-0-dd97f7854-j8w86                    1/1     Running     0               3m30s
rook-ceph-exporter-worker-1-599f867bd5-xggzk                   1/1     Running     0               3m2s
rook-ceph-exporter-worker-2-57d7ff9d4-gpbnn                    1/1     Running     0               3m5s
rook-ceph-mgr-a-74bd484c59-b68db                               3/3     Running     0               3m47s
rook-ceph-mgr-b-657494fdb8-xvgvn                               3/3     Running     0               3m46s
rook-ceph-mon-a-76dbb96546-q2hjp                               2/2     Running     0               4m35s
rook-ceph-mon-b-59f78db56d-fk6zc                               2/2     Running     0               4m11s
rook-ceph-mon-c-54468d5b57-9v4jt                               2/2     Running     0               4m
rook-ceph-operator-c4c68496c-5fq2z                             1/1     Running     0               4m56s
rook-ceph-osd-0-6b6997966f-dqnbb                               2/2     Running     0               3m11s
rook-ceph-osd-1-5c8ccdf584-pm5sk                               2/2     Running     0               3m9s
rook-ceph-osd-2-5db9b85d84-mbqkx                               2/2     Running     0               3m6s
rook-ceph-osd-3-64cc4bb945-g9mv5                               2/2     Running     0               3m8s
rook-ceph-osd-4-f748bdf75-d2xtr                                2/2     Running     0               3m9s
rook-ceph-osd-5-b656b9858-wbbqp                                2/2     Running     0               3m5s
rook-ceph-osd-prepare-21050acb4621a3bbc5c998ff7aabb7c2-x827m   0/1     Completed   0               3m22s
rook-ceph-osd-prepare-27e0bfaeb580bf80299e468d03a8cb6b-lk4qp   0/1     Completed   0               3m22s
rook-ceph-osd-prepare-545eae7d33c702fcc4c20a8b19db653c-xjv9q   0/1     Completed   0               3m21s
rook-ceph-osd-prepare-6dd8b0badddf0e4c48db945e1732dc1b-vdhf8   0/1     Completed   0               3m23s
rook-ceph-osd-prepare-9c2d51e01979a1a5e091282f8750ad43-68zn6   0/1     Completed   0               3m23s
rook-ceph-osd-prepare-d1fdf319c891150c92a6f87261ce8ea4-xfxmg   0/1     Completed   0               3m20s
rook-ceph-osd-prepare-worker-0-data-0vdr8z-b7g5w               0/1     Pending     0               3m19s
rook-ceph-osd-prepare-worker-1-data-0wtd6f-qj752               0/1     Pending     0               3m18s
rook-ceph-osd-prepare-worker-2-data-09cmf4-tlqlm               0/1     Pending     0               3m17s
ux-backend-server-5f557fccd7-l4vxh                             2/2     Running     0               17m

PVC:

[root@rdr-replicaui-bastion-0 ~]# oc get pvc -n openshift-storage
NAME                                     STATUS    VOLUME              CAPACITY   ACCESS MODES   STORAGECLASS   AGE
ocs-deviceset-localblock-0-data-0mr7tw   Bound     local-pv-83296199   500Gi      RWO            localblock     3m31s
ocs-deviceset-localblock-0-data-1l6js7   Bound     local-pv-e7f2664    500Gi      RWO            localblock     3m31s
ocs-deviceset-localblock-0-data-24pms2   Bound     local-pv-caa979f9   500Gi      RWO            localblock     3m31s
ocs-deviceset-localblock-0-data-3mmkcl   Bound     local-pv-682f849f   500Gi      RWO            localblock     3m31s
ocs-deviceset-localblock-0-data-4mfsnz   Bound     local-pv-64f835e    500Gi      RWO            localblock     3m31s
ocs-deviceset-localblock-0-data-56fc65   Bound     local-pv-dede79a3   500Gi      RWO            localblock     3m31s
worker-0-data-0vdr8z                     Pending                                                 localblock     3m31s
worker-1-data-0wtd6f                     Pending                                                 localblock     3m31s
worker-2-data-09cmf4                     Pending                                                 localblock     3m30s

Storagecluster: 

[root@rdr-replicaui-bastion-0 ~]# oc get storagecluster -n openshift-storage
NAME                 AGE     PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   5m18s   Progressing              2024-01-23T19:52:06Z   4.15.0

[root@rdr-replicaui-bastion-0 ~]# oc get storagecluster -n openshift-storage -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
  kind: StorageCluster
  metadata:
    annotations:
      cluster.ocs.openshift.io/local-devices: "true"
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful
    creationTimestamp: "2024-01-23T19:52:06Z"
    finalizers:
    - storagecluster.ocs.openshift.io
    generation: 2
    name: ocs-storagecluster
    namespace: openshift-storage
    ownerReferences:
    - apiVersion: odf.openshift.io/v1alpha1
      kind: StorageSystem
      name: ocs-storagecluster-storagesystem
      uid: 8d7e0409-5ff1-41b4-a966-488d05d31cde
    resourceVersion: "185561"
    uid: e5fb31da-1b4e-46c1-9178-2c9c6274efa5
  spec:
    arbiter: {}
    encryption:
      kms: {}
    externalStorage: {}
    flexibleScaling: true
    managedResources:
      cephBlockPools:
        defaultStorageClass: true
      cephCluster: {}
      cephConfig: {}
      cephDashboard: {}
      cephFilesystems: {}
      cephNonResilientPools:
        enable: true
      cephObjectStoreUsers: {}
      cephObjectStores: {}
      cephRBDMirror:
        daemonCount: 1
      cephToolbox: {}
    mirroring: {}
    monDataDirHostPath: /var/lib/rook
    network:
      connections:
        encryption: {}
      multiClusterService: {}
    nodeTopologies: {}
    resourceProfile: balanced
    storageDeviceSets:
    - config: {}
      count: 6
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: "1"
          storageClassName: localblock
          volumeMode: Block
        status: {}
      name: ocs-deviceset-localblock
      placement: {}
      preparePlacement: {}
      replica: 1
      resources: {}
  status:
    conditions:
    - lastHeartbeatTime: "2024-01-23T19:52:07Z"
      lastTransitionTime: "2024-01-23T19:52:07Z"
      message: Version check successful
      reason: VersionMatched
      status: "False"
      type: VersionMismatch
    - lastHeartbeatTime: "2024-01-23T19:53:58Z"
      lastTransitionTime: "2024-01-23T19:52:07Z"
      message: 'Error while reconciling: some StorageClasses were skipped while waiting
        for pre-requisites to be met: [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd,ocs-storagecluster-ceph-non-resilient-rbd]'
      reason: ReconcileFailed
      status: "False"
      type: ReconcileComplete
    - lastHeartbeatTime: "2024-01-23T19:52:07Z"
      lastTransitionTime: "2024-01-23T19:52:07Z"
      message: Initializing StorageCluster
      reason: Init
      status: "False"
      type: Available
    - lastHeartbeatTime: "2024-01-23T19:52:07Z"
      lastTransitionTime: "2024-01-23T19:52:07Z"
      message: Initializing StorageCluster
      reason: Init
      status: "True"
      type: Progressing
    - lastHeartbeatTime: "2024-01-23T19:52:07Z"
      lastTransitionTime: "2024-01-23T19:52:07Z"
      message: Initializing StorageCluster
      reason: Init
      status: "False"
      type: Degraded
    - lastHeartbeatTime: "2024-01-23T19:52:07Z"
      lastTransitionTime: "2024-01-23T19:52:07Z"
      message: Initializing StorageCluster
      reason: Init
      status: Unknown
      type: Upgradeable
    currentMonCount: 3
    failureDomain: host
    failureDomainKey: kubernetes.io/hostname
    failureDomainValues:
    - worker-0
    - worker-1
    - worker-2
    images:
      ceph:
        actualImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
        desiredImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
      noobaaCore:
        desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f
      noobaaDB:
        desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293
    kmsServerConnection: {}
    nodeTopologies:
      labels:
        kubernetes.io/hostname:
        - worker-0
        - worker-1
        - worker-2
    phase: Progressing
    relatedObjects:
    - apiVersion: ceph.rook.io/v1
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      namespace: openshift-storage
      resourceVersion: "185525"
      uid: 34feb16f-f548-4630-9836-52666cc7abf1
    version: 4.15.0
kind: List
metadata:
  resourceVersion: ""

Storageclass: 

[root@rdr-replicaui-bastion-0 ~]# oc get sc
NAME                          PROVISIONER                             RECLAIMPOLICY   VOLUMEBINDINGMODE      ALLOWVOLUMEEXPANSION   AGE
localblock                    kubernetes.io/no-provisioner            Delete          WaitForFirstConsumer   false                  10m
ocs-storagecluster-ceph-rgw   openshift-storage.ceph.rook.io/bucket   Delete          Immediate              false                  5m34s

Comment 3 Malay Kumar parida 2024-01-24 09:47:06 UTC
UI while creating the storagecluster using LSO-backed PVs, it sets the storagecluster spec such that it consumes all available PVs. As for this case, this cluster had 6 available PVs, and UI set the count to 6 in the storageDeviceSets spec. So there remain no available PVs. When replica-1 is also enabled the replica-1 osds get no PVs to bind to. 

A solution would be that when Enable replica-1 option is ticked it should leave at least 1 PV per node for the replica-1 OSDs to cosume.

Comment 4 Bipul Adhikari 2024-01-30 10:48:38 UTC
This can cause very tricky scenarios what if there is only 1 disk per node. What do we do in such cases?

Comment 5 Malay Kumar parida 2024-02-05 04:24:11 UTC
I suggested this here https://issues.redhat.com/browse/RHSTOR-4696?focusedId=24052963&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-24052963,
Travis says it to be a viable solution. Just awaiting Eran's confirmation.

Comment 8 Bipul Adhikari 2024-02-09 03:49:42 UTC
As per the discussions with various stakeholders involved we have decided to remove UI support for it and make it a CLI feature. We will revisit this issue in 4.16 timeline to add it possibly as a day 2 operation from the Block pool creation page.

Comment 9 Aaruni Aggarwal 2024-02-22 09:38:10 UTC
Replica-1 checkbox is removed from Storagesystem UI. Verified in ODF build: v4.15.0-144.stable

Attaching screenshot.

Comment 13 errata-xmlrpc 2024-03-19 15:32:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.