Description of problem (please be detailed as possible and provide log
snippets):
Platform: VMware LSO
Mode - Arbiter
A version of all relevant components (if applicable):
OCP version 4.7.0-0.nightly-2021-01-05-220959
OCS version ocs-operator.v4.7.0-222.ci
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes
Is there any workaround available to the best of your knowledge?
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2
Can this issue reproducible?
yes
Can this issue reproduce from the UI?
yes
If this is a regression, please provide more details to justify this:
Steps to Reproduce:
1:- Install OCP 4.7 and LSO operator (UI doesn’t support bringing up arbiter MON on Mater node yet)
2:- Label the nodes with .
topology.kubernetes.io/zone=us-east-2a and failure-domain.beta.kubernetes.io/zone=us-east-2a, see additional info for more details.
Note: Since the current OCS build does not have the new features, edited the CSV to add the following:
oc edit csv ocs-operator.v4.7.0-222.ci
Edit the enabled features to the following:
features.ocs.openshift.io/enabled: ‘[“kms”, “arbiter”, “flexible-scaling”]’
Install OCS operator 4.7.0-222.ci and click on create storage cluster
3. Select Internal -Attached mode
Sub-Steps
3a Discover Disks: -> Select Nodes: --> Select 2W nodes, each in say zone-A and zone-B (to bring up OSDs)
3b. Create Storage Class -> Provide name for SC and PVs will be created on the LSO disks
3c. Storage and the nodes -> Click on the checkbox to Enable Arbiter, select the arbiter zone (here zone: us-east-2c) and select the storageclass created in above step.
3d. Configure -> No change
3e. Review and create: review the selections and click create
Actual results:
There were no any rook pods created(mon,osd,rgw,mgr)
Snippet from rook-operator pods:-
ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage"
2021-01-06 12:50:01.118622 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster
2021-01-06 12:50:01.129652 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 5
Expected results:
There should be no error in a rook-operator pod and all rook-pods should be created
Additional info:
oc get nodes --show-labels
NAME STATUS ROLES AGE VERSION LABELS
compute-0 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-1 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-2 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
compute-3 Ready worker 6h39m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-4 Ready worker 6h37m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-5 Ready worker 6h37m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
control-plane-0 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
control-plane-1 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
control-plane-2 Ready master 6h46m v1.20.0+8e0d026 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
(In reply to Mudit Agarwal from comment #3)
> Umanga, I guess this should be fixed by
> https://github.com/openshift/ocs-operator/pull/976
This PR fixes the annotation so we no longer have to use hacks.
This issue is something else. I'm looking into it but it's mostly a miss configuration as others have tested it successfully.
Flexible Scaling is enabled which forced Failure Domain to be Host instead of Zone.
So Arbiter Mode fails.
Disable Flexible Scaling and Arbiter should work fine.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.
https://access.redhat.com/errata/RHSA-2020:5633