Created attachment 1751239 [details] UI screenshot for current implementation of flexible scaling in Internal mode Description of problem: ============================== After a long round of discussion in email threads, following was decided: Internal mode (3 AZ or 1 AZ) - > do not enable N+1 scaling by default. Hence, in current OCS 4.7 + OCP 4.7 deployment, this feature gets enabled by default even for clusters with zone=0 (e.g. vmware dynamic) Hence, we need to disable this feature in UI for dynamic(Internal) mode Eran's summary ----------------- 1. When having 3 zones on any platform, we will keep the current OCS 4.6 behavior. 2. When having a single zone on any platform we will differentiate between the two options: Using LSO - We will automatically apply the flexible scaling, using count 3, replica 1, and portable = false. Using dynamic provisioning storage - we will keep the current OCS 4.6 behavior Nithya's summary Summing up the discussion: -------------------------------- >> 1. Internal mode : a)No flexible scaling. b)count = 1, replica = 3 c) Use current 4.6 behaviour of racks if zones < 3 >> 2. Internal Attached: a) Flexible scaling = true b) count = 3, replica = 1 c) When expanding via UI, expand in sets of 3 OSDs. Increment count in this case. d) The customer can use CLI to expand in increments of 1. Version-Release number of selected component (if applicable): =============================== OCP 4.7 OCS 4.7 How reproducible: ================== Always Steps to Reproduce: =================== 1. Install OCS 4.7 on a vmware setup whre zone=0 2. The flexible scaling is enabled by default Expected results: ================== After the fix is reverted for Internal mode, flexible scaling will not be enabled even if zone=0 for dynamic mode clusters
Flexible Scaling does not true for Internal mode cluster 1.Install OCP cluster on vmware: Provider:VSphere OCP Version: 4.7.0-0.nightly-2021-02-05-053408 2.Install Internal OCS4.7 Operator via UI: OCS Version:ocs-operator.v4.7.0-251.ci 3.Check ceph status: sh-4.4# ceph health HEALTH_OK 4.Check failureDomain: $ oc get cephblockpool -n openshift-storage -o yaml | grep failureDomain f:failureDomain: {} failureDomain: rack 5.Get storagecluster: $ oc get storagecluster -n openshift-storage -o yaml | grep -i flex $ -> flexibleScaling does not exist $ oc get storagecluster -n openshift-storage -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: annotations: uninstall.ocs.openshift.io/cleanup-policy: delete uninstall.ocs.openshift.io/mode: graceful creationTimestamp: "2021-02-05T14:00:40Z" finalizers: - storagecluster.ocs.openshift.io generation: 2 managedFields: - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: .: {} f:arbiter: {} f:encryption: .: {} f:kms: {} f:nodeTopologies: {} manager: Mozilla operation: Update time: "2021-02-05T14:00:40Z" - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:uninstall.ocs.openshift.io/cleanup-policy: {} f:uninstall.ocs.openshift.io/mode: {} f:finalizers: {} f:spec: f:externalStorage: {} f:managedResources: .: {} f:cephBlockPools: {} f:cephFilesystems: {} f:cephObjectStoreUsers: {} f:cephObjectStores: {} f:storageDeviceSets: {} f:version: {} f:status: .: {} f:conditions: {} f:failureDomain: {} f:images: .: {} f:ceph: .: {} f:actualImage: {} f:desiredImage: {} f:noobaaCore: .: {} f:actualImage: {} f:desiredImage: {} f:noobaaDB: .: {} f:actualImage: {} f:desiredImage: {} f:nodeTopologies: .: {} f:labels: .: {} f:kubernetes.io/hostname: {} f:topology.rook.io/rack: {} f:phase: {} f:relatedObjects: {} manager: ocs-operator operation: Update time: "2021-02-05T14:03:43Z" name: ocs-storagecluster namespace: openshift-storage resourceVersion: "70736" selfLink: /apis/ocs.openshift.io/v1/namespaces/openshift-storage/storageclusters/ocs-storagecluster uid: 0155585e-c490-433e-be03-b9b731c85604 spec: arbiter: {} encryption: kms: {} externalStorage: {} managedResources: cephBlockPools: {} cephFilesystems: {} cephObjectStoreUsers: {} cephObjectStores: {} nodeTopologies: {} storageDeviceSets: - config: {} count: 1 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: 512Gi storageClassName: thin volumeMode: Block status: {} name: ocs-deviceset-thin placement: {} portable: true preparePlacement: {} replica: 3 resources: {} version: 4.7.0 status: conditions: - lastHeartbeatTime: "2021-02-05T14:10:46Z" lastTransitionTime: "2021-02-05T14:00:42Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: ReconcileComplete - lastHeartbeatTime: "2021-02-05T14:10:46Z" lastTransitionTime: "2021-02-05T14:05:58Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Available - lastHeartbeatTime: "2021-02-05T14:10:46Z" lastTransitionTime: "2021-02-05T14:05:58Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Progressing - lastHeartbeatTime: "2021-02-05T14:10:46Z" lastTransitionTime: "2021-02-05T14:00:41Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Degraded - lastHeartbeatTime: "2021-02-05T14:10:46Z" lastTransitionTime: "2021-02-05T14:05:58Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Upgradeable failureDomain: rack images: ceph: actualImage: quay.io/rhceph-dev/rhceph@sha256:35e13c86bf5891b6db3386e74fc2be728906173a7aabb5d1aa11452a62d136e9 desiredImage: quay.io/rhceph-dev/rhceph@sha256:35e13c86bf5891b6db3386e74fc2be728906173a7aabb5d1aa11452a62d136e9 noobaaCore: actualImage: quay.io/rhceph-dev/mcg-core@sha256:6462b82d9e0d90b5312cabd8a5e9701dd8f550104667f66ad1faa8a826ea79ce desiredImage: quay.io/rhceph-dev/rhceph@sha256:35e13c86bf5891b6db3386e74fc2be728906173a7aabb5d1aa11452a62d136e9 noobaaDB: actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:c6b6da4f762c2f68bfe558efe954739438ffa2e9aae1c617b50011fb0eed8347 desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:c6b6da4f762c2f68bfe558efe954739438ffa2e9aae1c617b50011fb0eed8347 nodeTopologies: labels: kubernetes.io/hostname: - compute-0 - compute-1 - compute-2 topology.rook.io/rack: - rack0 - rack1 - rack2 phase: Ready relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster name: ocs-storagecluster-cephcluster namespace: openshift-storage resourceVersion: "70716" uid: f0138b4f-f68b-487a-bb04-7da6663cc891 - apiVersion: noobaa.io/v1alpha1 kind: NooBaa name: noobaa namespace: openshift-storage resourceVersion: "70735" uid: 1ab320df-3258-4d6e-b6d6-6fb036f22c21 kind: List metadata: resourceVersion: "" selfLink: ""
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633