Bug 2000573
| Summary: | Incorrect StorageCluster CR created and ODF cluster getting installed with 2 Zone OCP cluster | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Bipin Kunal <bkunal> |
| Component: | Console Storage Plugin | Assignee: | Afreen <afrahman> |
| Status: | CLOSED ERRATA | QA Contact: | Jilju Joy <jijoy> |
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 4.9 | CC: | afrahman, aos-bugs, assingh, jefbrown, madam, muagarwa, musoni, nberry, nibalach, nthomas, ocs-bugs, sostapov, srozen |
| Target Milestone: | --- | Keywords: | Regression |
| Target Release: | 4.10.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-10 16:07:01 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 2001983 | ||
|
Description
Bipin Kunal
2021-09-02 12:10:51 UTC
Few things I missed to add in the description, adding it here: version: OCP version: 4.9.0-0.nightly-2021-09-01-193941 OCS version: 4.9.0-120.ci $ oc get pvc -n openshift-storage
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
db-noobaa-db-pg-0 Pending ocs-storagecluster-ceph-rbd 115m
ocs-deviceset-gp2-0-data-057qxb Bound pvc-57ae009f-00c6-49c6-b1f7-7225b4eea84f 512Gi RWO gp2 117m
rook-ceph-mon-a Bound pvc-df1edead-13ce-42d3-97d2-44d94a7b424c 50Gi RWO gp2 119m
rook-ceph-mon-b Bound pvc-dea4df9c-1b1c-4e5a-afcb-4af857d064e2 50Gi RWO gp2 119m
rook-ceph-mon-c Bound pvc-a74b12f6-b7c9-419e-9328-8c7b62e3e828 50Gi RWO gp2 119m
I see replica 1 being set in the storagecluster yaml..
=======================================
storageDeviceSets:
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: gp2
volumeMode: Block
status: {}
name: ocs-deviceset-gp2
placement: {}
preparePlacement: {}
replica: 1
resources: {}
version: 4.9.0
=======================================
*** Bug 2000711 has been marked as a duplicate of this bug. *** There is W/A to change storagecluster count from 1 to 3. Moving back to high From the storagecluster.yaml:
spec:
arbiter: {}
encryption:
enable: true
kms: {}
externalStorage: {}
flexibleScaling: true < --- this is why the replica is set to 1
managedResources:
...
failureDomain: host
failureDomainKey: kubernetes.io/hostname
failureDomainValues:
- ip-10-0-137-89.us-west-1.compute.internal
- ip-10-0-144-4.us-west-1.compute.internal
- ip-10-0-253-129.us-west-1.compute.internal
FlexibleScaling should only be enabled for internal-attached clusters in which case it will set the count to the number of OSDs and the replica to 1 (this replica is unrelated to the data replication factor of the pool).
If this was not an internal-attached cluster, the storagecluster should have used racks.
(In reply to N Balachandran from comment #10) > From the storagecluster.yaml: > > spec: > arbiter: {} > encryption: > enable: true > kms: {} > externalStorage: {} > flexibleScaling: true < --- this is why the replica is set to 1 > managedResources: > ... > failureDomain: host > failureDomainKey: kubernetes.io/hostname > failureDomainValues: > - ip-10-0-137-89.us-west-1.compute.internal > - ip-10-0-144-4.us-west-1.compute.internal > - ip-10-0-253-129.us-west-1.compute.internal > Interesting, we need to find why this is being set. > > FlexibleScaling should only be enabled for internal-attached clusters in > which case it will set the count to the number of OSDs and the replica to 1 > (this replica is unrelated to the data replication factor of the pool). agreed > > If this was not an internal-attached cluster, the storagecluster should have > used racks. Certainly, this was not internal-attached or using local devices. This option used was "Use an existing storage class", I guess this is the same Internal Mode. Reproducible with an OCP 4.9 cluster (using clusterbot) and the odf operator. As the storagecluster CR is created by the console, moving it to the console component. Summary of issues: 1. The UI should not enable flexibleScaling for Internal mode StorageClusters. Only Internal-Attached StorageClusters should be configured to enable flexibleScaling when the storage nodes are in fewer than 3 zones. 2. 3 nodes were selected but the StorageCluster.spec.storageDeviceSets[0].count was set to 1. If flexibleScaling is enabled, the count should be set to the number of OSDs as StorageCluster.spec.storageDeviceSets[0].replica is set to 1. Verified in version:
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.10.0-0.nightly-2021-10-21-014208 True False 78m Cluster version is 4.10.0-0.nightly-2021-10-21-014208
$ oc get csv odf-operator.v4.9.0 -o yaml | grep full_version
full_version: 4.9.0-195.ci
Tested on AWS internal mode.
Used 2 availability zones. ODF installation and storage system creation was done from GUI. Replica value is 3 as expected.
$ oc get storagecluster -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
kind: StorageCluster
metadata:
annotations:
storagesystem.odf.openshift.io/watched-by: ocs-storagecluster-storagesystem
uninstall.ocs.openshift.io/cleanup-policy: delete
uninstall.ocs.openshift.io/mode: graceful
creationTimestamp: "2021-10-21T14:05:27Z"
finalizers:
- storagecluster.ocs.openshift.io
generation: 3
name: ocs-storagecluster
namespace: openshift-storage
kms: {}
externalStorage: {}
resourceVersion: "60478"
uid: b3a83935-79cf-4ef8-bc1c-0aa023051c21
spec:
arbiter: {}
encryption:
managedResources:
cephBlockPools: {}
cephConfig: {}
cephDashboard: {}
cephFilesystems: {}
cephObjectStoreUsers: {}
cephObjectStores: {}
nodeTopologies: {}
storageDeviceSets:
- config: {}
count: 1
dataPVCTemplate:
metadata: {}
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 512Gi
storageClassName: gp2
volumeMode: Block
status: {}
name: ocs-deviceset-gp2
placement: {}
portable: true
preparePlacement: {}
replica: 3
resources: {}
version: 4.9.0
status:
conditions:
- lastHeartbeatTime: "2021-10-21T14:19:01Z"
lastTransitionTime: "2021-10-21T14:10:32Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: ReconcileComplete
- lastHeartbeatTime: "2021-10-21T14:19:01Z"
lastTransitionTime: "2021-10-21T14:11:50Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: Available
- lastHeartbeatTime: "2021-10-21T14:19:01Z"
lastTransitionTime: "2021-10-21T14:11:50Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "False"
type: Progressing
- lastHeartbeatTime: "2021-10-21T14:19:01Z"
lastTransitionTime: "2021-10-21T14:05:28Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "False"
type: Degraded
- lastHeartbeatTime: "2021-10-21T14:19:01Z"
lastTransitionTime: "2021-10-21T14:11:50Z"
message: Reconcile completed successfully
reason: ReconcileCompleted
status: "True"
type: Upgradeable
failureDomain: rack
failureDomainKey: topology.rook.io/rack
failureDomainValues:
- rack0
- rack1
- rack2
images:
ceph:
actualImage: quay.io/rhceph-dev/rhceph@sha256:b5ff930b8b35b4ac002f0f34b4be112b3a433b5615f2ea65402a54a84b6edadb
desiredImage: quay.io/rhceph-dev/rhceph@sha256:b5ff930b8b35b4ac002f0f34b4be112b3a433b5615f2ea65402a54a84b6edadb
noobaaCore:
actualImage: quay.io/rhceph-dev/mcg-core@sha256:f60e2a6a87c1e49be237740d16f74f95578d24213f6a3b85bba4185313278672
desiredImage: quay.io/rhceph-dev/mcg-core@sha256:f60e2a6a87c1e49be237740d16f74f95578d24213f6a3b85bba4185313278672
noobaaDB:
actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:1b91c9946f4351bd3688bc538d498e6738cd8a5285af998be6e8dfe218dca6fa
desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:1b91c9946f4351bd3688bc538d498e6738cd8a5285af998be6e8dfe218dca6fa
nodeTopologies:
labels:
failure-domain.beta.kubernetes.io/region:
- us-east-2
failure-domain.beta.kubernetes.io/zone:
- us-east-2a
- us-east-2b
kubernetes.io/hostname:
- ip-10-0-200-3.us-east-2.compute.internal
- ip-10-0-142-161.us-east-2.compute.internal
- ip-10-0-181-8.us-east-2.compute.internal
topology.rook.io/rack:
- rack0
- rack1
- rack2
phase: Ready
relatedObjects:
- apiVersion: ceph.rook.io/v1
kind: CephCluster
name: ocs-storagecluster-cephcluster
namespace: openshift-storage
resourceVersion: "60289"
uid: 36284fff-5b83-4f5a-b475-0e4b5baffbc6
- apiVersion: noobaa.io/v1alpha1
kind: NooBaa
name: noobaa
namespace: openshift-storage
resourceVersion: "60477"
uid: 83c8145b-6d22-424e-9754-94d76d92a9e7
kind: List
metadata:
resourceVersion: ""
selfLink: ""
$ oc get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
db-noobaa-db-pg-0 Bound pvc-50a57630-1f3b-449b-9aaf-7c18698b2196 50Gi RWO ocs-storagecluster-ceph-rbd 9m7s
ocs-deviceset-gp2-0-data-0ww4v6 Bound pvc-8e3544b2-2e41-42c8-914d-941abd2b96d5 512Gi RWO gp2 10m
ocs-deviceset-gp2-1-data-0xswnv Bound pvc-fc5d3a5b-7263-47a0-9069-af9c2d59f412 512Gi RWO gp2 10m
ocs-deviceset-gp2-2-data-0pdtkp Bound pvc-49de13cc-b26b-4f61-96d6-3cf995505009 512Gi RWO gp2 10m
rook-ceph-mon-a Bound pvc-0398b6e1-cba3-43c4-8e44-0da317dace61 50Gi RWO gp2 13m
rook-ceph-mon-b Bound pvc-59ba210a-0977-4247-9514-e9925b8c0eb1 50Gi RWO gp2 13m
rook-ceph-mon-c Bound pvc-23d83615-c55f-4bfe-971a-5e9208aad590 50Gi RWO gp2 13m
$ oc get pods -o wide -l app=rook-ceph-osd
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
rook-ceph-osd-0-7fd67cb559-g62rp 2/2 Running 0 10m 10.129.2.20 ip-10-0-200-3.us-east-2.compute.internal <none> <none>
rook-ceph-osd-1-7f96cb594f-jvbvp 2/2 Running 0 10m 10.131.0.60 ip-10-0-142-161.us-east-2.compute.internal <none> <none>
rook-ceph-osd-2-69c6755569-9jrlv 2/2 Running 0 10m 10.128.2.25 ip-10-0-181-8.us-east-2.compute.internal <none> <none>
$ oc get nodes -o wide --show-labels
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME LABELS
ip-10-0-138-73.us-east-2.compute.internal Ready master 104m v1.22.1+d767194 10.0.138.73 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-138-73.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
ip-10-0-142-161.us-east-2.compute.internal Ready worker 97m v1.22.1+d767194 10.0.142.161 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.4xlarge,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-142-161.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.4xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack1
ip-10-0-181-8.us-east-2.compute.internal Ready worker 97m v1.22.1+d767194 10.0.181.8 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.4xlarge,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-181-8.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.4xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a,topology.rook.io/rack=rack2
ip-10-0-190-213.us-east-2.compute.internal Ready master 104m v1.22.1+d767194 10.0.190.213 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-190-213.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2a,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2a
ip-10-0-198-192.us-east-2.compute.internal Ready master 104m v1.22.1+d767194 10.0.198.192 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m4.xlarge,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-198-192.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.kubernetes.io/instance-type=m4.xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2b,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2b
ip-10-0-200-3.us-east-2.compute.internal Ready worker 96m v1.22.1+d767194 10.0.200.3 <none> Red Hat Enterprise Linux CoreOS 410.84.202110191922-0 (Ootpa) 4.18.0-305.19.1.el8_4.x86_64 cri-o://1.22.0 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/instance-type=m5.4xlarge,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/region=us-east-2,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=ip-10-0-200-3.us-east-2.compute.internal,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.kubernetes.io/instance-type=m5.4xlarge,node.openshift.io/os_id=rhcos,topology.ebs.csi.aws.com/zone=us-east-2b,topology.kubernetes.io/region=us-east-2,topology.kubernetes.io/zone=us-east-2b,topology.rook.io/rack=rack0
$ ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 1.50000 root default
-6 1.50000 region us-east-2
-5 1.00000 zone us-east-2a
-4 0.50000 rack rack1
-3 0.50000 host ocs-deviceset-gp2-2-data-0pdtkp
1 ssd 0.50000 osd.1 up 1.00000 1.00000
-18 0.50000 rack rack2
-17 0.50000 host ocs-deviceset-gp2-1-data-0xswnv
2 ssd 0.50000 osd.2 up 1.00000 1.00000
-13 0.50000 zone us-east-2b
-12 0.50000 rack rack0
-11 0.50000 host ocs-deviceset-gp2-0-data-0ww4v6
0 ssd 0.50000 osd.0 up 1.00000 1.00000
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:0056 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |