Created attachment 1744935 [details] UI screenshot Description of problem (please be detailed as possible and provide log snippests): ============================================================== Raising this BZ based on suggestion by Talur after a troubleshooting session on Bug 1913292. In non-cloud platforms, on enabling arbiter, even flexible scaling gets enabled as zone count <3 (OSD nodes are distributed only on 2 zones, instead of regular 3) Details =========== In AWS, when zone>2, flexible scaling is set to false by default. But in case of VMware LSO + arbiter mode, even though we added 3 zones (here us-east-2a and 2b for 2 for OSDs and us-east-2c for arbiter, the total zone count = 2 and UI enables the flexible scaling along with arbiter. Message in UI >>When all the selected nodes in the storage class are in a single zone the cluster will be using a host-based failure domain This results in conflict as arbiter is expecting failure domain based on zones and flexible scaling sets it on hostname Snip from storagecluster.yaml -- spec: arbiter: enable: true -- externalStorage: {} flexibleScaling: true managedResources: -- nodeTopologies: arbiterLocation: us-east-2c >> snip from Ceph cluster.yaml mon: count: 5 stretchCluster: failureDomainLabel: kubernetes.io/hostname zones: - name: compute-1 - name: compute-0 - name: compute-4 - name: compute-3 - arbiter: true name: us-east-2c Version of all relevant components (if applicable): ====================================================== OCP version 4.7.0-0.nightly-2021-01-05-220959 OCS version ocs-operator.v4.7.0-222.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? ======================================================== yes arbiter install fails and we hit the deployment bug - 1913292 Is there any workaround available to the best of your knowledge? ============================================================== Not sure Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? =========================================================== 4 Can this issue reproducible? ============================== Yes but tested only once Can this issue reproduce from the UI? ======================================= Yes If this is a regression, please provide more details to justify this: =================================================== New features. Steps to Reproduce: ======================= 1:- Install OCP 4.7 and LSO operator (UI doesn’t support bringing up arbiter MON on Mater node yet) 2:- Label the nodes with . topology.kubernetes.io/zone=us-east-2a and failure-domain.beta.kubernetes.io/zone=us-east-2a, see additional info for more details. Note: Since the current OCS build does not have the new features, edited the CSV to add the following: oc edit csv ocs-operator.v4.7.0-222.ci Edit the enabled features to the following: features.ocs.openshift.io/enabled: ‘[“kms”, “arbiter”, “flexible-scaling”]’ Install OCS operator 4.7.0-222.ci and click on create storage cluster 3. Select Internal -Attached mode Sub-Steps 3a Discover Disks: -> Select Nodes: --> Select 2W nodes, each in say zone-A and zone-B (to bring up OSDs) 3b. Create Storage Class -> Provide name for SC and PVs will be created on the LSO disks 3c. Storage and the nodes -> Click on the checkbox to Enable Arbiter, select the arbiter zone (here zone: us-east-2c) and select the storageclass created in above step. 3d. Configure -> No change 3e. Review and create: review the selections and click create Actual results: ================== failureDomain is incorrectly set to kubernetes.io/hostname in arbiter install, as flexible scaling is set to true instead of false. Expected results: ========================= Flexible scaling should be set to true only if cluster is non-arbiter and zone count <3 Additional info: ===================== Snippet from rook-operator pods:- ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage" 2021-01-06 12:50:01.118622 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster 2021-01-06 12:50:01.129652 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 5 Additional info: oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS compute-0 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a compute-1 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b compute-2 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c compute-3 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a compute-4 Ready worker 6h37m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b compute-5 Ready worker 6h37m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c control-plane-0 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a control-plane-1 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b control-plane-2 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
QE will try to deploy storage cluster CR as explained in the reproducer.
Need to fix this log issue: "E0210 23:17:48.027136 1 event.go:334] Unsupported event type: 'Error'" SetUp: LSO Cluster Provider: Vmware OCP Version:4.7.0-0.nightly-2021-02-09-224509 OCS Version:ocs-operator.v4.7.0-257.ci Test Procedure: 1.Install LSO cluster via UI with 2 zones *There is not option to enable Aribter via UI (attached screenshot) compute-0 and compute-1 on zone-a oc label node compute-0 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a oc label node compute-1 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a compute-2 on zone-b oc label node compute-2 failure-domain.beta.kubernetes.io/zone=b topology.kubernetes.io/zone=b 2.Get Storagecluster yaml file -> (flexibleScaling enable, arbiter disabled) spec: arbiter: {} encryption: kms: {} externalStorage: {} flexibleScaling: true managedResources: cephBlockPools: {} cephFilesystems: {} cephObjectStoreUsers: {} cephObjectStores: {} monDataDirHostPath: /var/lib/rook nodeTopologies: {} storageDeviceSets: - config: {} count: 3 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: "1" storageClassName: localblock volumeMode: Block status: {} name: ocs-deviceset-localblock placement: {} preparePlacement: {} replica: 1 resources: {} version: 4.7.0 2.Enable arbiter $ oc edit storagecluster -n openshift-storage arbiter: enable: true 3.Check logs on ocs-operator pod: E0210 23:17:48.027136 1 event.go:334] Unsupported event type: 'Error' {"level":"error","ts":1612999068.0271533,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/deps/gomod/pkg/mod/github.com/go-logr/zapr.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:90"} -> arbiter and flexibleScaling both can't be enabled Need to fix this log: "E0210 23:17:48.027136 1 event.go:334] Unsupported event type: 'Error'"
Created attachment 1756313 [details] arbiter not clickable on UI (install storage cluster)
https://github.com/openshift/ocs-operator/pull/1060
Bug Fixed Test Procedure: 1.Install OCP 4.7 Cluster Provider: VSphere OCP Version: 4.7.0-0.nightly-2021-03-22-025559 2.Lable compute-0,compute-1 zone A, compute-2 zone B: $ oc label node compute-0 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a node/compute-0 labeled $ oc label node compute-1 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a node/compute-1 labeled $ oc label node compute-2 failure-domain.beta.kubernetes.io/zone=b topology.kubernetes.io/zone=b node/compute-2 labeled 3.Install Local Storage via UI Local Storage Version: 4.7.0-202103060100.p0 4.Install OCS via UI OCS Version: 4.7.0-307.ci sh-4.4# ceph versions { "mon": { "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 1 }, "osd": { "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 3 }, "mds": { "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 2 }, "rgw": { "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 1 }, "overall": { "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 10 } } 5.Install Storage Cluster via UI 6.Enable arbiter on storagecluster $ oc edit storagecluster -n openshift-storage arbiter: enable: true $ oc get storagecluster -n openshift-storage -o yaml apiVersion: v1 items: - apiVersion: ocs.openshift.io/v1 kind: StorageCluster metadata: annotations: cluster.ocs.openshift.io/local-devices: "true" uninstall.ocs.openshift.io/cleanup-policy: delete uninstall.ocs.openshift.io/mode: graceful creationTimestamp: "2021-03-24T08:24:18Z" finalizers: - storagecluster.ocs.openshift.io generation: 3 managedFields: - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: .: {} f:cluster.ocs.openshift.io/local-devices: {} f:spec: .: {} f:arbiter: {} f:encryption: .: {} f:enable: {} f:kms: {} f:flexibleScaling: {} f:monDataDirHostPath: {} f:nodeTopologies: {} manager: Mozilla operation: Update time: "2021-03-24T08:24:18Z" - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:annotations: f:uninstall.ocs.openshift.io/cleanup-policy: {} f:uninstall.ocs.openshift.io/mode: {} f:finalizers: {} f:spec: f:externalStorage: {} f:managedResources: .: {} f:cephBlockPools: {} f:cephConfig: {} f:cephFilesystems: {} f:cephObjectStoreUsers: {} f:cephObjectStores: {} f:storageDeviceSets: {} f:version: {} f:status: .: {} f:conditions: {} f:failureDomain: {} f:failureDomainKey: {} f:failureDomainValues: {} f:images: .: {} f:ceph: .: {} f:actualImage: {} f:desiredImage: {} f:noobaaCore: .: {} f:actualImage: {} f:desiredImage: {} f:noobaaDB: .: {} f:actualImage: {} f:desiredImage: {} f:nodeTopologies: .: {} f:labels: .: {} f:failure-domain.beta.kubernetes.io/zone: {} f:kubernetes.io/hostname: {} f:phase: {} f:relatedObjects: {} manager: ocs-operator operation: Update time: "2021-03-24T08:25:57Z" - apiVersion: ocs.openshift.io/v1 fieldsType: FieldsV1 fieldsV1: f:spec: f:arbiter: f:enable: {} manager: kubectl-edit operation: Update time: "2021-03-24T09:39:24Z" name: ocs-storagecluster namespace: openshift-storage resourceVersion: "202669" selfLink: /apis/ocs.openshift.io/v1/namespaces/openshift-storage/storageclusters/ocs-storagecluster uid: 5d10c103-1ed6-4f76-b9c6-79ea8bdd7b68 spec: arbiter: enable: true encryption: enable: true kms: {} externalStorage: {} flexibleScaling: true managedResources: cephBlockPools: {} cephConfig: {} cephFilesystems: {} cephObjectStoreUsers: {} cephObjectStores: {} monDataDirHostPath: /var/lib/rook nodeTopologies: {} storageDeviceSets: - config: {} count: 3 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: "1" storageClassName: localblock volumeMode: Block status: {} name: ocs-deviceset-localblock placement: {} preparePlacement: {} replica: 1 resources: {} version: 4.7.0 status: conditions: - lastHeartbeatTime: "2021-03-24T09:39:19Z" lastTransitionTime: "2021-03-24T08:24:20Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: ReconcileComplete - lastHeartbeatTime: "2021-03-24T09:39:19Z" lastTransitionTime: "2021-03-24T08:27:53Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Available - lastHeartbeatTime: "2021-03-24T09:39:19Z" lastTransitionTime: "2021-03-24T08:27:53Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Progressing - lastHeartbeatTime: "2021-03-24T09:39:19Z" lastTransitionTime: "2021-03-24T08:24:19Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "False" type: Degraded - lastHeartbeatTime: "2021-03-24T09:39:19Z" lastTransitionTime: "2021-03-24T08:27:53Z" message: Reconcile completed successfully reason: ReconcileCompleted status: "True" type: Upgradeable failureDomain: host failureDomainKey: kubernetes.io/hostname failureDomainValues: - compute-2 - compute-0 - compute-1 images: ceph: actualImage: quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3 desiredImage: quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3 noobaaCore: actualImage: quay.io/rhceph-dev/mcg-core@sha256:54d2ea9d4e18f6c4bb1a11dfec741d1adb62c34d98ca4c488f9b06c070a794d3 desiredImage: quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3 noobaaDB: actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:ed859e2054840467e9a0ffc310ddf74ff64a8743c236598ca41c7557d8cdc767 desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:ed859e2054840467e9a0ffc310ddf74ff64a8743c236598ca41c7557d8cdc767 nodeTopologies: labels: failure-domain.beta.kubernetes.io/zone: - a - b kubernetes.io/hostname: - compute-2 - compute-0 - compute-1 phase: Ready relatedObjects: - apiVersion: ceph.rook.io/v1 kind: CephCluster name: ocs-storagecluster-cephcluster namespace: openshift-storage resourceVersion: "202213" uid: 1f24c28a-de22-4d9d-a693-569cc6909337 - apiVersion: noobaa.io/v1alpha1 kind: NooBaa name: noobaa namespace: openshift-storage resourceVersion: "202631" uid: f6947698-684c-4d57-ba61-08eca6108726 kind: List metadata: resourceVersion: "" selfLink: "" $ oc logs ocs-operator-64d77857dc-q7wk5 "arbiter and flexibleScaling both can't be enabled" {"level":"error","ts":1616579562.1646447,"logger":"controllers.StorageCluster","msg":"Failed to validate ArbiterSpec","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).validateStorageClusterSpec\n\t/remote-source/app/controllers/storagecluster/reconcile.go:211\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:244\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"} {"level":"error","ts":1616579562.164746,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"} {"level":"error","ts":1616579567.9811654,"logger":"controllers.StorageCluster","msg":"Failed to validate ArbiterSpec","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).validateStorageClusterSpec\n\t/remote-source/app/controllers/storagecluster/reconcile.go:211\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:244\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"} {"level":"error","ts":1616579567.9812632,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"}
Warning log on UI: "arbiter and flexibleScaling both can't be enabled" *Attached print screen
Created attachment 1766229 [details] arbiter warning ui
Created attachment 1766234 [details] arbirter warning log on Persistent Storage Tab
Hi Talur, I agreed that with this fix, we see the message "arbiter and flexibleScaling both can't be enabled" in logs and Event tabs, but we are still able to set both arbiter and flexiblescaling as true in Storagecluster a) Storagecluster is still in ready state, so how will users be indicated that what they did is not acceptable? b) they might not check the events or logs as Storagecluster state is still ready and our changes are accepted.
Created attachment 1766246 [details] arbiter warning on storagecluster page
Nitin, please check if a status condition also needs to set/updated.
I have sent a PR to change the Status.Phase to Error state https://github.com/openshift/ocs-operator/pull/1134
We have a PR to reflect the state, moving the BZ to POST again.
Storage Cluster state is ready (arbiter and flexible Scaling enabled) Test Procedure: 1.Install OCP 4.7 Cluster Provider:VSphere OCP Version: 4.7.0-0.nightly-2021-04-01-061355 2.Lable compute-0,compute-1 zone A, compute-2 zone B: $ oc label node compute-0 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a node/compute-0 labeled $ oc label node compute-1 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a node/compute-1 labeled $ oc label node compute-2 failure-domain.beta.kubernetes.io/zone=b topology.kubernetes.io/zone=b node/compute-2 labeled 3.Install Local Storage via UI Local Storage Version: 4.7.0-202103202139.p0 4.Install OCS via UI OCS Version: 4.7.0-339.ci sh-4.4# ceph versions { "mon": { "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 3 }, "mgr": { "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 1 }, "osd": { "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 3 }, "mds": { "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 2 }, "rgw": { "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 1 }, "overall": { "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 10 } } 5.Install Storage Cluster via UI 6.Enable arbiter on storagecluster $ oc edit storagecluster -n openshift-storage storagecluster.ocs.openshift.io/ocs-storagecluster edited spec: arbiter: enable: true spec: arbiter: enable: true encryption: enable: true kms: {} externalStorage: {} flexibleScaling: true managedResources: cephBlockPools: {} cephConfig: {} cephFilesystems: {} cephObjectStoreUsers: {} cephObjectStores: {} monDataDirHostPath: /var/lib/rook nodeTopologies: {} storageDeviceSets: - config: {} count: 3 dataPVCTemplate: metadata: {} spec: accessModes: - ReadWriteOnce resources: requests: storage: "1" storageClassName: localblock volumeMode: Block status: {} name: ocs-deviceset-localblock placement: {} preparePlacement: {} replica: 1 resources: {} version: 4.7.0 $ oc logs ocs-operator-5bcdd97ff4-mh6sp {"level":"error","ts":1617524035.0535948,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"} for more details: https://docs.google.com/document/d/1Ahu6qEIbaYOij3KO0fyHKrAAAmunjQZ-WuPsC8oULOE/edit
Is this really a blocker? We are in RC phase now for 4.7 so we have to reassess all FailedQA.
(In reply to Mudit Agarwal from comment #23) > Is this really a blocker? We are in RC phase now for 4.7 so we have to > reassess all FailedQA. As part of this bug, we have added the logs and events but did not change the PHASE of storageCluster. Changing PHASE of storageCluster comes as a request in comment 16. As we are already in the RC we can verify this bug and create a new one for changing the PHASE specifically. which can go in 4.7 async or later.
Thanks Nitin. Oded, I agree with Nitin here, this is not a blocker. I am moving it back to ON_QA, please raise a new BZ for status and I will add it as a known issue for release notes. We can fix it in 4.8 and bring to 4.7.z if required.
Can we pls raise a bug for the same, so that I can add it to known issues and fill doc text for the same.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2041
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days