Description of problem (please be detailed as possible and provide log snippests): Replica 1 cluster is being installed using UI. Version of all relevant components (if applicable): Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? No sure. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue be reproducible? Yes. Always in 2 zone OCP cluster Can this issue reproduce from the UI? yes, Installation via UI reproduces it 100% time If this is a regression, please provide more details to justify this: I think so. In fact not sure. I haven't tried 2 zone deployment earlier. But I believe with 2 zones, it should fall to racks instead of deploying replica 1 Steps to Reproduce: 1. Create an OCP-4.9 two zone cluster 2. Install ODF-4.9 operator 3. Create Storage system Actual results: Expected results: Storage System/cluster should be created with Replica 3 Additional info:
Few things I missed to add in the description, adding it here: version: OCP version: 4.9.0-0.nightly-2021-09-01-193941 OCS version: 4.9.0-120.ci
$ oc get pvc -n openshift-storage NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Pending ocs-storagecluster-ceph-rbd 115m ocs-deviceset-gp2-0-data-057qxb Bound pvc-57ae009f-00c6-49c6-b1f7-7225b4eea84f 512Gi RWO gp2 117m rook-ceph-mon-a Bound pvc-df1edead-13ce-42d3-97d2-44d94a7b424c 50Gi RWO gp2 119m rook-ceph-mon-b Bound pvc-dea4df9c-1b1c-4e5a-afcb-4af857d064e2 50Gi RWO gp2 119m rook-ceph-mon-c Bound pvc-a74b12f6-b7c9-419e-9328-8c7b62e3e828 50Gi RWO gp2 119m I see replica 1 being set in the storagecluster yaml.. ======================================= storageDeviceSets: - config: {} count: 1 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: gp2 volumeMode: Block status: {} name: ocs-deviceset-gp2 placement: {} preparePlacement: {} replica: 1 resources: {} version: 4.9.0 =======================================
*** Bug 2000711 has been marked as a duplicate of this bug. ***
There is W/A to change storagecluster count from 1 to 3. Moving back to high
From the storagecluster.yaml: spec: arbiter: {} encryption: enable: true kms: {} externalStorage: {} flexibleScaling: true < --- this is why the replica is set to 1 managedResources: ... failureDomain: host failureDomainKey: kubernetes.io/hostname failureDomainValues: - ip-10-0-137-89.us-west-1.compute.internal - ip-10-0-144-4.us-west-1.compute.internal - ip-10-0-253-129.us-west-1.compute.internal FlexibleScaling should only be enabled for internal-attached clusters in which case it will set the count to the number of OSDs and the replica to 1 (this replica is unrelated to the data replication factor of the pool). If this was not an internal-attached cluster, the storagecluster should have used racks.
(In reply to N Balachandran from comment #10) > From the storagecluster.yaml: > > spec: > arbiter: {} > encryption: > enable: true > kms: {} > externalStorage: {} > flexibleScaling: true < --- this is why the replica is set to 1 > managedResources: > ... > failureDomain: host > failureDomainKey: kubernetes.io/hostname > failureDomainValues: > - ip-10-0-137-89.us-west-1.compute.internal > - ip-10-0-144-4.us-west-1.compute.internal > - ip-10-0-253-129.us-west-1.compute.internal > Interesting, we need to find why this is being set. > > FlexibleScaling should only be enabled for internal-attached clusters in > which case it will set the count to the number of OSDs and the replica to 1 > (this replica is unrelated to the data replication factor of the pool). agreed > > If this was not an internal-attached cluster, the storagecluster should have > used racks. Certainly, this was not internal-attached or using local devices. This option used was "Use an existing storage class", I guess this is the same Internal Mode.
Reproducible with an OCP 4.9 cluster (using clusterbot) and the odf operator. As the storagecluster CR is created by the console, moving it to the console component.
Summary of issues: 1. The UI should not enable flexibleScaling for Internal mode StorageClusters. Only Internal-Attached StorageClusters should be configured to enable flexibleScaling when the storage nodes are in fewer than 3 zones. 2. 3 nodes were selected but the StorageCluster.spec.storageDeviceSets[0].count was set to 1. If flexibleScaling is enabled, the count should be set to the number of OSDs as StorageCluster.spec.storageDeviceSets[0].replica is set to 1.
Verified in version: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.10.0-0.nightly-2021-10-21-014208 True False 78m Cluster version is 4.10.0-0.nightly-2021-10-21-014208 $ oc get csv odf-operator.v4.9.0 -o yaml | grep full_version full_version: 4.9.0-195.ci Tested on AWS internal mode. Used 2 availability zones. ODF installation and storage system creation was done from GUI. Replica value is 3 as expected. $ oc get storagecluster -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: annotations: storagesystem.odf.openshift.io/watched-by: ocs-storagecluster-storagesystem uninstall.ocs.openshift.io/cleanup-policy: delete uninstall.ocs.openshift.io/mode: graceful creationTimestamp: "2021-10-21T14:05:27Z" finalizers: - storagecluster.ocs.openshift.io generation: 3 name: ocs-storagecluster namespace: openshift-storage kms: {} externalStorage: {} resourceVersion: "60478" uid: b3a83935-79cf-4ef8-bc1c-0aa023051c21 spec: arbiter: {} encryption: managedResources: cephBlockPools: {} cephConfig: {} cephDashboard: {} cephFilesystems: {} cephObjectStoreUsers: {} cephObjectStores: {} nodeTopologies: {} storageDeviceSets: - config: {} count: 1 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: gp2 volumeMode: Block status: {} name: ocs-deviceset-gp2 placement: {} portable: true preparePlacement: {} replica: 3 resources: {} version: 4.9.0 status: conditions: - lastHeartbeatTime: "2021-10-21T14:19:01Z" lastTransitionTime: "2021-10-21T14:10:32Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: ReconcileComplete - lastHeartbeatTime: "2021-10-21T14:19:01Z" lastTransitionTime: "2021-10-21T14:11:50Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Available - lastHeartbeatTime: "2021-10-21T14:19:01Z" lastTransitionTime: "2021-10-21T14:11:50Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Progressing - lastHeartbeatTime: "2021-10-21T14:19:01Z" lastTransitionTime: "2021-10-21T14:05:28Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Degraded - lastHeartbeatTime: "2021-10-21T14:19:01Z" lastTransitionTime: "2021-10-21T14:11:50Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Upgradeable failureDomain: rack failureDomainKey: topology.rook.io/rack failureDomainValues: - rack0 - rack1 - rack2 images: ceph: actualImage: quay.io/rhceph-dev/rhceph@sha256:b5ff930b8b35b4ac002f0f34b4be112b3a433b5615f2ea65402a54a84b6edadb desiredImage: quay.io/rhceph-dev/rhceph@sha256:b5ff930b8b35b4ac002f0f34b4be112b3a433b5615f2ea65402a54a84b6edadb noobaaCore: actualImage: quay.io/rhceph-dev/mcg-core@sha256:f60e2a6a87c1e49be237740d16f74f95578d24213f6a3b85bba4185313278672 desiredImage: quay.io/rhceph-dev/mcg-core@sha256:f60e2a6a87c1e49be237740d16f74f95578d24213f6a3b85bba4185313278672 noobaaDB: actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:1b91c9946f4351bd3688bc538d498e6738cd8a5285af998be6e8dfe218dca6fa desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:1b91c9946f4351bd3688bc538d498e6738cd8a5285af998be6e8dfe218dca6fa nodeTopologies: labels: failure-domain.beta.kubernetes.io/region: - us-east-2 failure-domain.beta.kubernetes.io/zone: - us-east-2a - us-east-2b kubernetes.io/hostname: - ip-10-0-200-3.us-east-2.compute.internal - ip-10-0-142-161.us-east-2.compute.internal - ip-10-0-181-8.us-east-2.compute.internal topology.rook.io/rack: - rack0 - rack1 - rack2 phase: Ready relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster name: ocs-storagecluster-cephcluster namespace: openshift-storage resourceVersion: "60289" uid: 36284fff-5b83-4f5a-b475-0e4b5baffbc6 - apiVersion: noobaa.io/v1alpha1 kind: NooBaa name: noobaa namespace: openshift-storage resourceVersion: "60477" uid: 83c8145b-6d22-424e-9754-94d76d92a9e7 kind: List metadata: resourceVersion: "" selfLink: "" $ oc get pvc NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE db-noobaa-db-pg-0 Bound pvc-50a57630-1f3b-449b-9aaf-7c18698b2196 50Gi RWO ocs-storagecluster-ceph-rbd 9m7s ocs-deviceset-gp2-0-data-0ww4v6 Bound pvc-8e3544b2-2e41-42c8-914d-941abd2b96d5 512Gi RWO gp2 10m ocs-deviceset-gp2-1-data-0xswnv Bound pvc-fc5d3a5b-7263-47a0-9069-af9c2d59f412 512Gi RWO gp2 10m ocs-deviceset-gp2-2-data-0pdtkp Bound pvc-49de13cc-b26b-4f61-96d6-3cf995505009 512Gi RWO gp2 10m rook-ceph-mon-a Bound pvc-0398b6e1-cba3-43c4-8e44-0da317dace61 50Gi RWO gp2 13m rook-ceph-mon-b Bound pvc-59ba210a-0977-4247-9514-e9925b8c0eb1 50Gi RWO gp2 13m rook-ceph-mon-c Bound pvc-23d83615-c55f-4bfe-971a-5e9208aad590 50Gi RWO gp2 13m $ oc get pods -o wide -l app=rook-ceph-osd NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES rook-ceph-osd-0-7fd67cb559-g62rp 2/2 Running 0 10m 10.129.2.20 ip-10-0-200-3.us-east-2.compute.internal <none> <none> rook-ceph-osd-1-7f96cb594f-jvbvp 2/2 Running 0 10m 10.131.0.60 ip-10-0-142-161.us-east-2.compute.internal <none> <none> rook-ceph-osd-2-69c6755569-9jrlv 2/2 Running 0 10m 10.128.2.25 ip-10-0-181-8.us-east-2.compute.internal <none> <none> $ oc get nodes -o wide --show-labels NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME LABELS ip-10-0-138-73.us-east-2.compute.internal Ready master 104m v1.22.1+d767194 10.0.138.73 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-138-73.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-142-161.us-east-2.compute.internal Ready worker 97m v1.22.1+d767194 10.0.142.161 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.4xlarge,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-142-161.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.4xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack1 ip-10-0-181-8.us-east-2.compute.internal Ready worker 97m v1.22.1+d767194 10.0.181.8 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.4xlarge,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-181-8.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.4xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack2 ip-10-0-190-213.us-east-2.compute.internal Ready master 104m v1.22.1+d767194 10.0.190.213 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-190-213.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a ip-10-0-198-192.us-east-2.compute.internal Ready master 104m v1.22.1+d767194 10.0.198.192 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-198-192.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2b,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2b ip-10-0-200-3.us-east-2.compute.internal Ready worker 96m v1.22.1+d767194 10.0.200.3 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.4xlarge,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-200-3.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.4xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2b,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2b,topology.rook.io/rack=rack0 $ ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 1.50000 root default -6 1.50000 region us-east-2 -5 1.00000 zone us-east-2a -4 0.50000 rack rack1 -3 0.50000 host ocs-deviceset-gp2-2-data-0pdtkp 1 ssd 0.50000 osd.1 up 1.00000 1.00000 -18 0.50000 rack rack2 -17 0.50000 host ocs-deviceset-gp2-1-data-0xswnv 2 ssd 0.50000 osd.2 up 1.00000 1.00000 -13 0.50000 zone us-east-2b -12 0.50000 rack rack0 -11 0.50000 host ocs-deviceset-gp2-0-data-0ww4v6 0 ssd 0.50000 osd.0 up 1.00000 1.00000
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days