Bug 1913292

Summary: OCS 4.7 Installation failed over vmware when arbiter was enabled, as flexibleScaling is also getting enabled
Product: OpenShift Container Platform Reporter: Pratik Surve <prsurve>
Component: Console Storage PluginAssignee: Ankush Behl <anbehl>
Status: CLOSED ERRATA QA Contact: Pratik Surve <prsurve>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: afrahman, anbehl, aos-bugs, ebenahar, madam, muagarwa, nberry, nthomas, ocs-bugs, sostapov, uchapaga
Target Milestone: ---Keywords: TestBlocker
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-02-24 15:50:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Pratik Surve 2021-01-06 13:09:52 UTC
Description of problem (please be detailed as possible and provide log
snippets):
Platform: VMware LSO
Mode - Arbiter 

A version of all relevant components (if applicable):
OCP version 4.7.0-0.nightly-2021-01-05-220959
OCS version ocs-operator.v4.7.0-222.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:

1:- Install OCP 4.7 and LSO operator (UI doesn’t support bringing up arbiter MON on Mater node yet)

2:- Label the nodes with .
topology.kubernetes.io/zone=us-east-2a and failure-domain.beta.kubernetes.io/zone=us-east-2a, see additional info for more details.

Note: Since the current OCS build does not have the new features, edited the CSV to add the following:
oc edit csv ocs-operator.v4.7.0-222.ci

Edit the enabled features to the following:
features.ocs.openshift.io/enabled: ‘[“kms”, “arbiter”, “flexible-scaling”]’

Install OCS operator 4.7.0-222.ci and click on create storage cluster
3. Select Internal -Attached mode

Sub-Steps
3a Discover Disks: -> Select Nodes: --> Select 2W nodes, each in say zone-A and zone-B (to bring up OSDs)
3b. Create Storage Class -> Provide name for SC and PVs will be created on the LSO disks
3c. Storage and the nodes -> Click on the checkbox to Enable Arbiter, select the arbiter zone (here zone: us-east-2c) and select the storageclass created in above step.
3d. Configure -> No change
3e. Review and create: review the selections and click create


Actual results:

There were no any rook pods created(mon,osd,rgw,mgr)

Snippet from rook-operator pods:-

ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage"
2021-01-06 12:50:01.118622 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster
2021-01-06 12:50:01.129652 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 5


Expected results:
There should be no error in a rook-operator pod and all rook-pods should be created 

Additional info:

oc get nodes --show-labels                 
NAME              STATUS   ROLES    AGE     VERSION           LABELS
compute-0         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-1         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-2         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
compute-3         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-4         Ready    worker   6h37m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-5         Ready    worker   6h37m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
control-plane-0   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
control-plane-1   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
control-plane-2   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c

Comment 3 Mudit Agarwal 2021-01-06 13:24:06 UTC
Umanga, I guess this should be fixed by https://github.com/openshift/ocs-operator/pull/976

Comment 8 umanga 2021-01-06 15:23:31 UTC
(In reply to Mudit Agarwal from comment #3)
> Umanga, I guess this should be fixed by
> https://github.com/openshift/ocs-operator/pull/976

This PR fixes the annotation so we no longer have to use hacks.

This issue is something else. I'm looking into it but it's mostly a miss configuration as others have tested it successfully.

Comment 9 umanga 2021-01-06 16:43:03 UTC
Flexible Scaling is enabled which forced Failure Domain to be Host instead of Zone.
So Arbiter Mode fails.

Disable Flexible Scaling and Arbiter should work fine.

Comment 17 errata-xmlrpc 2021-02-24 15:50:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633