Description of problem (please be detailed as possible and provide log snippests): Enabling Replica-1 from UI is not working on LSO backed ODF on IBM Power(ppc64le), although setup is having 2 disks per worker node. Version of all relevant components (if applicable): OCP: 4.15.0 ODF: 4.15.0-123 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create an OCP4.15 cluster having 3 worker nodes. Install ODF4.15 operator 2. Install LSO4.15 operator. 3. Create localvolume with 2 disks per worker node. This will create 6PVs. 4. Create Storagesystem from UI by enabling replica-1 pool from UI itself. Actual results: replica-1 pool is not working because all the 6PVs are getting consumed by OSDs and not by replica-1. Expected results: replica-1 pool should work. Additional info:
CSV: [root@rdr-replicaui-bastion-0 ~]# oc get csv -A NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-local-storage local-storage-operator.v4.15.0-202311280332 Local Storage 4.15.0-202311280332 Succeeded openshift-operator-lifecycle-manager packageserver Package Server 0.0.1-snapshot Succeeded openshift-storage mcg-operator.v4.15.0-123.stable NooBaa Operator 4.15.0-123.stable Succeeded openshift-storage ocs-operator.v4.15.0-123.stable OpenShift Container Storage 4.15.0-123.stable Succeeded openshift-storage odf-csi-addons-operator.v4.15.0-123.stable CSI Addons 4.15.0-123.stable Succeeded openshift-storage odf-operator.v4.15.0-123.stable OpenShift Data Foundation 4.15.0-123.stable Succeeded [root@rdr-replicaui-bastion-0 ~]# pods: [root@rdr-replicaui-bastion-0 ~]# oc get pods -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-7485d8fdbf-vsp52 2/2 Running 0 17m csi-cephfsplugin-provisioner-9dd5ff5b-cvwfc 6/6 Running 0 4m47s csi-cephfsplugin-provisioner-9dd5ff5b-tm7l6 6/6 Running 2 (4m9s ago) 4m47s csi-cephfsplugin-rz7t8 2/2 Running 1 (4m10s ago) 4m47s csi-cephfsplugin-s85c8 2/2 Running 0 4m47s csi-cephfsplugin-t47l9 2/2 Running 1 (4m15s ago) 4m47s csi-rbdplugin-gwswd 3/3 Running 0 4m47s csi-rbdplugin-gzq9h 3/3 Running 1 (4m10s ago) 4m47s csi-rbdplugin-provisioner-6dbfb56bbf-9jpjk 6/6 Running 0 4m47s csi-rbdplugin-provisioner-6dbfb56bbf-n9qm9 6/6 Running 1 (4m14s ago) 4m47s csi-rbdplugin-tnzsc 3/3 Running 1 (4m16s ago) 4m47s noobaa-operator-77bc79475b-56rl2 2/2 Running 0 17m ocs-operator-5c5657798d-5fp5t 1/1 Running 0 17m odf-console-9848c5b76-lpz54 1/1 Running 0 17m odf-operator-controller-manager-55b9cbb9c5-dgz98 2/2 Running 0 17m rook-ceph-crashcollector-worker-0-88878b9c4-dvcfp 1/1 Running 0 3m30s rook-ceph-crashcollector-worker-1-657c67f5df-v7qv6 1/1 Running 0 3m6s rook-ceph-crashcollector-worker-2-75b7c79bd8-p84mp 1/1 Running 0 3m9s rook-ceph-exporter-worker-0-dd97f7854-j8w86 1/1 Running 0 3m30s rook-ceph-exporter-worker-1-599f867bd5-xggzk 1/1 Running 0 3m2s rook-ceph-exporter-worker-2-57d7ff9d4-gpbnn 1/1 Running 0 3m5s rook-ceph-mgr-a-74bd484c59-b68db 3/3 Running 0 3m47s rook-ceph-mgr-b-657494fdb8-xvgvn 3/3 Running 0 3m46s rook-ceph-mon-a-76dbb96546-q2hjp 2/2 Running 0 4m35s rook-ceph-mon-b-59f78db56d-fk6zc 2/2 Running 0 4m11s rook-ceph-mon-c-54468d5b57-9v4jt 2/2 Running 0 4m rook-ceph-operator-c4c68496c-5fq2z 1/1 Running 0 4m56s rook-ceph-osd-0-6b6997966f-dqnbb 2/2 Running 0 3m11s rook-ceph-osd-1-5c8ccdf584-pm5sk 2/2 Running 0 3m9s rook-ceph-osd-2-5db9b85d84-mbqkx 2/2 Running 0 3m6s rook-ceph-osd-3-64cc4bb945-g9mv5 2/2 Running 0 3m8s rook-ceph-osd-4-f748bdf75-d2xtr 2/2 Running 0 3m9s rook-ceph-osd-5-b656b9858-wbbqp 2/2 Running 0 3m5s rook-ceph-osd-prepare-21050acb4621a3bbc5c998ff7aabb7c2-x827m 0/1 Completed 0 3m22s rook-ceph-osd-prepare-27e0bfaeb580bf80299e468d03a8cb6b-lk4qp 0/1 Completed 0 3m22s rook-ceph-osd-prepare-545eae7d33c702fcc4c20a8b19db653c-xjv9q 0/1 Completed 0 3m21s rook-ceph-osd-prepare-6dd8b0badddf0e4c48db945e1732dc1b-vdhf8 0/1 Completed 0 3m23s rook-ceph-osd-prepare-9c2d51e01979a1a5e091282f8750ad43-68zn6 0/1 Completed 0 3m23s rook-ceph-osd-prepare-d1fdf319c891150c92a6f87261ce8ea4-xfxmg 0/1 Completed 0 3m20s rook-ceph-osd-prepare-worker-0-data-0vdr8z-b7g5w 0/1 Pending 0 3m19s rook-ceph-osd-prepare-worker-1-data-0wtd6f-qj752 0/1 Pending 0 3m18s rook-ceph-osd-prepare-worker-2-data-09cmf4-tlqlm 0/1 Pending 0 3m17s ux-backend-server-5f557fccd7-l4vxh 2/2 Running 0 17m PVC: [root@rdr-replicaui-bastion-0 ~]# oc get pvc -n openshift-storage NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE ocs-deviceset-localblock-0-data-0mr7tw Bound local-pv-83296199 500Gi RWO localblock 3m31s ocs-deviceset-localblock-0-data-1l6js7 Bound local-pv-e7f2664 500Gi RWO localblock 3m31s ocs-deviceset-localblock-0-data-24pms2 Bound local-pv-caa979f9 500Gi RWO localblock 3m31s ocs-deviceset-localblock-0-data-3mmkcl Bound local-pv-682f849f 500Gi RWO localblock 3m31s ocs-deviceset-localblock-0-data-4mfsnz Bound local-pv-64f835e 500Gi RWO localblock 3m31s ocs-deviceset-localblock-0-data-56fc65 Bound local-pv-dede79a3 500Gi RWO localblock 3m31s worker-0-data-0vdr8z Pending localblock 3m31s worker-1-data-0wtd6f Pending localblock 3m31s worker-2-data-09cmf4 Pending localblock 3m30s Storagecluster: [root@rdr-replicaui-bastion-0 ~]# oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 5m18s Progressing 2024-01-23T19:52:06Z 4.15.0 [root@rdr-replicaui-bastion-0 ~]# oc get storagecluster -n openshift-storage -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: annotations: cluster.ocs.openshift.io/local-devices: "true" uninstall.ocs.openshift.io/cleanup-policy: delete uninstall.ocs.openshift.io/mode: graceful creationTimestamp: "2024-01-23T19:52:06Z" finalizers: - storagecluster.ocs.openshift.io generation: 2 name: ocs-storagecluster namespace: openshift-storage ownerReferences: - apiVersion: odf.openshift.io/v1alpha1 kind: StorageSystem name: ocs-storagecluster-storagesystem uid: 8d7e0409-5ff1-41b4-a966-488d05d31cde resourceVersion: "185561" uid: e5fb31da-1b4e-46c1-9178-2c9c6274efa5 spec: arbiter: {} encryption: kms: {} externalStorage: {} flexibleScaling: true managedResources: cephBlockPools: defaultStorageClass: true cephCluster: {} cephConfig: {} cephDashboard: {} cephFilesystems: {} cephNonResilientPools: enable: true cephObjectStoreUsers: {} cephObjectStores: {} cephRBDMirror: daemonCount: 1 cephToolbox: {} mirroring: {} monDataDirHostPath: /var/lib/rook network: connections: encryption: {} multiClusterService: {} nodeTopologies: {} resourceProfile: balanced storageDeviceSets: - config: {} count: 6 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: "1" storageClassName: localblock volumeMode: Block status: {} name: ocs-deviceset-localblock placement: {} preparePlacement: {} replica: 1 resources: {} status: conditions: - lastHeartbeatTime: "2024-01-23T19:52:07Z" lastTransitionTime: "2024-01-23T19:52:07Z" message: Version check successful reason: VersionMatched status: "False" type: VersionMismatch - lastHeartbeatTime: "2024-01-23T19:53:58Z" lastTransitionTime: "2024-01-23T19:52:07Z" message: 'Error while reconciling: some StorageClasses were skipped while waiting for pre-requisites to be met: [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd,ocs-storagecluster-ceph-non-resilient-rbd]' reason: ReconcileFailed status: "False" type: ReconcileComplete - lastHeartbeatTime: "2024-01-23T19:52:07Z" lastTransitionTime: "2024-01-23T19:52:07Z" message: Initializing StorageCluster reason: Init status: "False" type: Available - lastHeartbeatTime: "2024-01-23T19:52:07Z" lastTransitionTime: "2024-01-23T19:52:07Z" message: Initializing StorageCluster reason: Init status: "True" type: Progressing - lastHeartbeatTime: "2024-01-23T19:52:07Z" lastTransitionTime: "2024-01-23T19:52:07Z" message: Initializing StorageCluster reason: Init status: "False" type: Degraded - lastHeartbeatTime: "2024-01-23T19:52:07Z" lastTransitionTime: "2024-01-23T19:52:07Z" message: Initializing StorageCluster reason: Init status: Unknown type: Upgradeable currentMonCount: 3 failureDomain: host failureDomainKey: kubernetes.io/hostname failureDomainValues: - worker-0 - worker-1 - worker-2 images: ceph: actualImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775 desiredImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775 noobaaCore: desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f noobaaDB: desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293 kmsServerConnection: {} nodeTopologies: labels: kubernetes.io/hostname: - worker-0 - worker-1 - worker-2 phase: Progressing relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster name: ocs-storagecluster-cephcluster namespace: openshift-storage resourceVersion: "185525" uid: 34feb16f-f548-4630-9836-52666cc7abf1 version: 4.15.0 kind: List metadata: resourceVersion: "" Storageclass: [root@rdr-replicaui-bastion-0 ~]# oc get sc NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE localblock kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 10m ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 5m34s
UI while creating the storagecluster using LSO-backed PVs, it sets the storagecluster spec such that it consumes all available PVs. As for this case, this cluster had 6 available PVs, and UI set the count to 6 in the storageDeviceSets spec. So there remain no available PVs. When replica-1 is also enabled the replica-1 osds get no PVs to bind to. A solution would be that when Enable replica-1 option is ticked it should leave at least 1 PV per node for the replica-1 OSDs to cosume.
This can cause very tricky scenarios what if there is only 1 disk per node. What do we do in such cases?
I suggested this here https://issues.redhat.com/browse/RHSTOR-4696?focusedId=24052963&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-24052963, Travis says it to be a viable solution. Just awaiting Eran's confirmation.
As per the discussions with various stakeholders involved we have decided to remove UI support for it and make it a CLI feature. We will revisit this issue in 4.16 timeline to add it possibly as a day 2 operation from the Block pool creation page.
Replica-1 checkbox is removed from Storagesystem UI. Verified in ODF build: v4.15.0-144.stable Attaching screenshot.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383