Bug 2260050
| Summary: | Enabling Replica-1 from UI is not working on LSO backed ODF on IBM Power cluster | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Aaruni Aggarwal <aaaggarw> |
| Component: | management-console | Assignee: | Bipul Adhikari <badhikar> |
| Status: | CLOSED ERRATA | QA Contact: | Aaruni Aggarwal <aaaggarw> |
| Severity: | unspecified | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.15 | CC: | kramdoss, mparida, muagarwa, nberry, nthomas, odf-bz-bot |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.15.0 | ||
| Hardware: | ppc64le | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | 4.15.0-136 | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2024-03-19 15:32:14 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Aaruni Aggarwal
2024-01-24 09:12:58 UTC
CSV:
[root@rdr-replicaui-bastion-0 ~]# oc get csv -A
NAMESPACE NAME DISPLAY VERSION REPLACES PHASE
openshift-local-storage local-storage-operator.v4.15.0-202311280332 Local Storage 4.15.0-202311280332 Succeeded
openshift-operator-lifecycle-manager packageserver Package Server 0.0.1-snapshot Succeeded
openshift-storage mcg-operator.v4.15.0-123.stable NooBaa Operator 4.15.0-123.stable Succeeded
openshift-storage ocs-operator.v4.15.0-123.stable OpenShift Container Storage 4.15.0-123.stable Succeeded
openshift-storage odf-csi-addons-operator.v4.15.0-123.stable CSI Addons 4.15.0-123.stable Succeeded
openshift-storage odf-operator.v4.15.0-123.stable OpenShift Data Foundation 4.15.0-123.stable Succeeded
[root@rdr-replicaui-bastion-0 ~]#
pods:
[root@rdr-replicaui-bastion-0 ~]# oc get pods -n openshift-storage
NAME READY STATUS RESTARTS AGE
csi-addons-controller-manager-7485d8fdbf-vsp52 2/2 Running 0 17m
csi-cephfsplugin-provisioner-9dd5ff5b-cvwfc 6/6 Running 0 4m47s
csi-cephfsplugin-provisioner-9dd5ff5b-tm7l6 6/6 Running 2 (4m9s ago) 4m47s
csi-cephfsplugin-rz7t8 2/2 Running 1 (4m10s ago) 4m47s
csi-cephfsplugin-s85c8 2/2 Running 0 4m47s
csi-cephfsplugin-t47l9 2/2 Running 1 (4m15s ago) 4m47s
csi-rbdplugin-gwswd 3/3 Running 0 4m47s
csi-rbdplugin-gzq9h 3/3 Running 1 (4m10s ago) 4m47s
csi-rbdplugin-provisioner-6dbfb56bbf-9jpjk 6/6 Running 0 4m47s
csi-rbdplugin-provisioner-6dbfb56bbf-n9qm9 6/6 Running 1 (4m14s ago) 4m47s
csi-rbdplugin-tnzsc 3/3 Running 1 (4m16s ago) 4m47s
noobaa-operator-77bc79475b-56rl2 2/2 Running 0 17m
ocs-operator-5c5657798d-5fp5t 1/1 Running 0 17m
odf-console-9848c5b76-lpz54 1/1 Running 0 17m
odf-operator-controller-manager-55b9cbb9c5-dgz98 2/2 Running 0 17m
rook-ceph-crashcollector-worker-0-88878b9c4-dvcfp 1/1 Running 0 3m30s
rook-ceph-crashcollector-worker-1-657c67f5df-v7qv6 1/1 Running 0 3m6s
rook-ceph-crashcollector-worker-2-75b7c79bd8-p84mp 1/1 Running 0 3m9s
rook-ceph-exporter-worker-0-dd97f7854-j8w86 1/1 Running 0 3m30s
rook-ceph-exporter-worker-1-599f867bd5-xggzk 1/1 Running 0 3m2s
rook-ceph-exporter-worker-2-57d7ff9d4-gpbnn 1/1 Running 0 3m5s
rook-ceph-mgr-a-74bd484c59-b68db 3/3 Running 0 3m47s
rook-ceph-mgr-b-657494fdb8-xvgvn 3/3 Running 0 3m46s
rook-ceph-mon-a-76dbb96546-q2hjp 2/2 Running 0 4m35s
rook-ceph-mon-b-59f78db56d-fk6zc 2/2 Running 0 4m11s
rook-ceph-mon-c-54468d5b57-9v4jt 2/2 Running 0 4m
rook-ceph-operator-c4c68496c-5fq2z 1/1 Running 0 4m56s
rook-ceph-osd-0-6b6997966f-dqnbb 2/2 Running 0 3m11s
rook-ceph-osd-1-5c8ccdf584-pm5sk 2/2 Running 0 3m9s
rook-ceph-osd-2-5db9b85d84-mbqkx 2/2 Running 0 3m6s
rook-ceph-osd-3-64cc4bb945-g9mv5 2/2 Running 0 3m8s
rook-ceph-osd-4-f748bdf75-d2xtr 2/2 Running 0 3m9s
rook-ceph-osd-5-b656b9858-wbbqp 2/2 Running 0 3m5s
rook-ceph-osd-prepare-21050acb4621a3bbc5c998ff7aabb7c2-x827m 0/1 Completed 0 3m22s
rook-ceph-osd-prepare-27e0bfaeb580bf80299e468d03a8cb6b-lk4qp 0/1 Completed 0 3m22s
rook-ceph-osd-prepare-545eae7d33c702fcc4c20a8b19db653c-xjv9q 0/1 Completed 0 3m21s
rook-ceph-osd-prepare-6dd8b0badddf0e4c48db945e1732dc1b-vdhf8 0/1 Completed 0 3m23s
rook-ceph-osd-prepare-9c2d51e01979a1a5e091282f8750ad43-68zn6 0/1 Completed 0 3m23s
rook-ceph-osd-prepare-d1fdf319c891150c92a6f87261ce8ea4-xfxmg 0/1 Completed 0 3m20s
rook-ceph-osd-prepare-worker-0-data-0vdr8z-b7g5w 0/1 Pending 0 3m19s
rook-ceph-osd-prepare-worker-1-data-0wtd6f-qj752 0/1 Pending 0 3m18s
rook-ceph-osd-prepare-worker-2-data-09cmf4-tlqlm 0/1 Pending 0 3m17s
ux-backend-server-5f557fccd7-l4vxh 2/2 Running 0 17m
PVC:
[root@rdr-replicaui-bastion-0 ~]# oc get pvc -n openshift-storage
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
ocs-deviceset-localblock-0-data-0mr7tw Bound local-pv-83296199 500Gi RWO localblock 3m31s
ocs-deviceset-localblock-0-data-1l6js7 Bound local-pv-e7f2664 500Gi RWO localblock 3m31s
ocs-deviceset-localblock-0-data-24pms2 Bound local-pv-caa979f9 500Gi RWO localblock 3m31s
ocs-deviceset-localblock-0-data-3mmkcl Bound local-pv-682f849f 500Gi RWO localblock 3m31s
ocs-deviceset-localblock-0-data-4mfsnz Bound local-pv-64f835e 500Gi RWO localblock 3m31s
ocs-deviceset-localblock-0-data-56fc65 Bound local-pv-dede79a3 500Gi RWO localblock 3m31s
worker-0-data-0vdr8z Pending localblock 3m31s
worker-1-data-0wtd6f Pending localblock 3m31s
worker-2-data-09cmf4 Pending localblock 3m30s
Storagecluster:
[root@rdr-replicaui-bastion-0 ~]# oc get storagecluster -n openshift-storage
NAME AGE PHASE EXTERNAL CREATED AT VERSION
ocs-storagecluster 5m18s Progressing 2024-01-23T19:52:06Z 4.15.0
[root@rdr-replicaui-bastion-0 ~]# oc get storagecluster -n openshift-storage -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
annotations:
cluster.ocs.openshift.io/local-devices: "true"
uninstall.ocs.openshift.io/cleanup-policy: delete
uninstall.ocs.openshift.io/mode: graceful
creationTimestamp: "2024-01-23T19:52:06Z"
finalizers:
- storagecluster.ocs.openshift.io
generation: 2
name: ocs-storagecluster
namespace: openshift-storage
ownerReferences:
- apiVersion: odf.openshift.io/v1alpha1
kind: StorageSystem
name: ocs-storagecluster-storagesystem
uid: 8d7e0409-5ff1-41b4-a966-488d05d31cde
resourceVersion: "185561"
uid: e5fb31da-1b4e-46c1-9178-2c9c6274efa5
spec:
arbiter: {}
encryption:
kms: {}
externalStorage: {}
flexibleScaling: true
managedResources:
cephBlockPools:
defaultStorageClass: true
cephCluster: {}
cephConfig: {}
cephDashboard: {}
cephFilesystems: {}
cephNonResilientPools:
enable: true
cephObjectStoreUsers: {}
cephObjectStores: {}
cephRBDMirror:
daemonCount: 1
cephToolbox: {}
mirroring: {}
monDataDirHostPath: /var/lib/rook
network:
connections:
encryption: {}
multiClusterService: {}
nodeTopologies: {}
resourceProfile: balanced
storageDeviceSets:
- config: {}
count: 6
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: "1"
storageClassName: localblock
volumeMode: Block
status: {}
name: ocs-deviceset-localblock
placement: {}
preparePlacement: {}
replica: 1
resources: {}
status:
conditions:
- lastHeartbeatTime: "2024-01-23T19:52:07Z"
lastTransitionTime: "2024-01-23T19:52:07Z"
message: Version check successful
reason: VersionMatched
status: "False"
type: VersionMismatch
- lastHeartbeatTime: "2024-01-23T19:53:58Z"
lastTransitionTime: "2024-01-23T19:52:07Z"
message: 'Error while reconciling: some StorageClasses were skipped while waiting
for pre-requisites to be met: [ocs-storagecluster-cephfs,ocs-storagecluster-ceph-rbd,ocs-storagecluster-ceph-non-resilient-rbd]'
reason: ReconcileFailed
status: "False"
type: ReconcileComplete
- lastHeartbeatTime: "2024-01-23T19:52:07Z"
lastTransitionTime: "2024-01-23T19:52:07Z"
message: Initializing StorageCluster
reason: Init
status: "False"
type: Available
- lastHeartbeatTime: "2024-01-23T19:52:07Z"
lastTransitionTime: "2024-01-23T19:52:07Z"
message: Initializing StorageCluster
reason: Init
status: "True"
type: Progressing
- lastHeartbeatTime: "2024-01-23T19:52:07Z"
lastTransitionTime: "2024-01-23T19:52:07Z"
message: Initializing StorageCluster
reason: Init
status: "False"
type: Degraded
- lastHeartbeatTime: "2024-01-23T19:52:07Z"
lastTransitionTime: "2024-01-23T19:52:07Z"
message: Initializing StorageCluster
reason: Init
status: Unknown
type: Upgradeable
currentMonCount: 3
failureDomain: host
failureDomainKey: kubernetes.io/hostname
failureDomainValues:
- worker-0
- worker-1
- worker-2
images:
ceph:
actualImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
desiredImage: registry.redhat.io/rhceph/rhceph-6-rhel9@sha256:9049ccf79a0e009682e30677f493b27263c2d9401958005de733a19506705775
noobaaCore:
desiredImage: registry.redhat.io/odf4/mcg-core-rhel9@sha256:41c509b225b92cdf088bda5a0fe538a8b2106a09713277158b71d2a5b9ae694f
noobaaDB:
desiredImage: registry.redhat.io/rhel9/postgresql-15@sha256:12afe2b0205a4aa24623f04d318d21f91393e4c70cf03a5f6720339e06d78293
kmsServerConnection: {}
nodeTopologies:
labels:
kubernetes.io/hostname:
- worker-0
- worker-1
- worker-2
phase: Progressing
relatedObjects:
- apiVersion: ceph.rook.io/v1
kind: CephCluster
name: ocs-storagecluster-cephcluster
namespace: openshift-storage
resourceVersion: "185525"
uid: 34feb16f-f548-4630-9836-52666cc7abf1
version: 4.15.0
kind: List
metadata:
resourceVersion: ""
Storageclass:
[root@rdr-replicaui-bastion-0 ~]# oc get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
localblock kubernetes.io/no-provisioner Delete WaitForFirstConsumer false 10m
ocs-storagecluster-ceph-rgw openshift-storage.ceph.rook.io/bucket Delete Immediate false 5m34s
UI while creating the storagecluster using LSO-backed PVs, it sets the storagecluster spec such that it consumes all available PVs. As for this case, this cluster had 6 available PVs, and UI set the count to 6 in the storageDeviceSets spec. So there remain no available PVs. When replica-1 is also enabled the replica-1 osds get no PVs to bind to. A solution would be that when Enable replica-1 option is ticked it should leave at least 1 PV per node for the replica-1 OSDs to cosume. This can cause very tricky scenarios what if there is only 1 disk per node. What do we do in such cases? I suggested this here https://issues.redhat.com/browse/RHSTOR-4696?focusedId=24052963&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-24052963, Travis says it to be a viable solution. Just awaiting Eran's confirmation. As per the discussions with various stakeholders involved we have decided to remove UI support for it and make it a CLI feature. We will revisit this issue in 4.16 timeline to add it possibly as a day 2 operation from the Block pool creation page. Replica-1 checkbox is removed from Storagesystem UI. Verified in ODF build: v4.15.0-144.stable Attaching screenshot. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383 |