Hide Forgot
Description of problem (please be detailed as possible and provide log snippets): Platform: VMware LSO Mode - Arbiter A version of all relevant components (if applicable): OCP version 4.7.0-0.nightly-2021-01-05-220959 OCS version ocs-operator.v4.7.0-222.ci Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? yes Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? yes Can this issue reproduce from the UI? yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1:- Install OCP 4.7 and LSO operator (UI doesn’t support bringing up arbiter MON on Mater node yet) 2:- Label the nodes with . topology.kubernetes.io/zone=us-east-2a and failure-domain.beta.kubernetes.io/zone=us-east-2a, see additional info for more details. Note: Since the current OCS build does not have the new features, edited the CSV to add the following: oc edit csv ocs-operator.v4.7.0-222.ci Edit the enabled features to the following: features.ocs.openshift.io/enabled: ‘[“kms”, “arbiter”, “flexible-scaling”]’ Install OCS operator 4.7.0-222.ci and click on create storage cluster 3. Select Internal -Attached mode Sub-Steps 3a Discover Disks: -> Select Nodes: --> Select 2W nodes, each in say zone-A and zone-B (to bring up OSDs) 3b. Create Storage Class -> Provide name for SC and PVs will be created on the LSO disks 3c. Storage and the nodes -> Click on the checkbox to Enable Arbiter, select the arbiter zone (here zone: us-east-2c) and select the storageclass created in above step. 3d. Configure -> No change 3e. Review and create: review the selections and click create Actual results: There were no any rook pods created(mon,osd,rgw,mgr) Snippet from rook-operator pods:- ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage" 2021-01-06 12:50:01.118622 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster 2021-01-06 12:50:01.129652 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 5 Expected results: There should be no error in a rook-operator pod and all rook-pods should be created Additional info: oc get nodes --show-labels NAME STATUS ROLES AGE VERSION LABELS compute-0 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a compute-1 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b compute-2 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c compute-3 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a compute-4 Ready worker 6h37m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b compute-5 Ready worker 6h37m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c control-plane-0 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a control-plane-1 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b control-plane-2 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
Umanga, I guess this should be fixed by https://github.com/openshift/ocs-operator/pull/976
(In reply to Mudit Agarwal from comment #3) > Umanga, I guess this should be fixed by > https://github.com/openshift/ocs-operator/pull/976 This PR fixes the annotation so we no longer have to use hacks. This issue is something else. I'm looking into it but it's mostly a miss configuration as others have tested it successfully.
Flexible Scaling is enabled which forced Failure Domain to be Host instead of Zone. So Arbiter Mode fails. Disable Flexible Scaling and Arbiter should work fine.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633