1913292 – OCS 4.7 Installation failed over vmware when arbiter was enabled, as flexibleScaling is also getting enabled

Bug 1913292 - OCS 4.7 Installation failed over vmware when arbiter was enabled, as flexibleScaling is also getting enabled

Summary: OCS 4.7 Installation failed over vmware when arbiter was enabled, as flexible...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Console Storage Plugin
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Ankush Behl
QA Contact:	Pratik Surve
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-06 13:09 UTC by Pratik Surve
Modified:	2021-02-24 15:50 UTC (History)
CC List:	11 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:50:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift console pull 7734	0	None	closed	Bug 1913292: Fix flexible scaling when arbiter enabled	2021-02-10 15:21:55 UTC
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:50:43 UTC

Description Pratik Surve 2021-01-06 13:09:52 UTC

Description of problem (please be detailed as possible and provide log
snippets):
Platform: VMware LSO
Mode - Arbiter 

A version of all relevant components (if applicable):
OCP version 4.7.0-0.nightly-2021-01-05-220959
OCS version ocs-operator.v4.7.0-222.ci

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
yes

Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
yes

Can this issue reproduce from the UI?
yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:

1:- Install OCP 4.7 and LSO operator (UI doesn’t support bringing up arbiter MON on Mater node yet)

2:- Label the nodes with .
topology.kubernetes.io/zone=us-east-2a and failure-domain.beta.kubernetes.io/zone=us-east-2a, see additional info for more details.

Note: Since the current OCS build does not have the new features, edited the CSV to add the following:
oc edit csv ocs-operator.v4.7.0-222.ci

Edit the enabled features to the following:
features.ocs.openshift.io/enabled: ‘[“kms”, “arbiter”, “flexible-scaling”]’

Install OCS operator 4.7.0-222.ci and click on create storage cluster
3. Select Internal -Attached mode

Sub-Steps
3a Discover Disks: -> Select Nodes: --> Select 2W nodes, each in say zone-A and zone-B (to bring up OSDs)
3b. Create Storage Class -> Provide name for SC and PVs will be created on the LSO disks
3c. Storage and the nodes -> Click on the checkbox to Enable Arbiter, select the arbiter zone (here zone: us-east-2c) and select the storageclass created in above step.
3d. Configure -> No change
3e. Review and create: review the selections and click create


Actual results:

There were no any rook pods created(mon,osd,rgw,mgr)

Snippet from rook-operator pods:-

ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage"
2021-01-06 12:50:01.118622 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster
2021-01-06 12:50:01.129652 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 5


Expected results:
There should be no error in a rook-operator pod and all rook-pods should be created 

Additional info:

oc get nodes --show-labels                 
NAME              STATUS   ROLES    AGE     VERSION           LABELS
compute-0         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-1         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-2         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
compute-3         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-4         Ready    worker   6h37m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-5         Ready    worker   6h37m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
control-plane-0   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
control-plane-1   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
control-plane-2   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c

Comment 3 Mudit Agarwal 2021-01-06 13:24:06 UTC

Umanga, I guess this should be fixed by https://github.com/openshift/ocs-operator/pull/976

Comment 8 umanga 2021-01-06 15:23:31 UTC

(In reply to Mudit Agarwal from comment #3)
> Umanga, I guess this should be fixed by
> https://github.com/openshift/ocs-operator/pull/976

This PR fixes the annotation so we no longer have to use hacks.

This issue is something else. I'm looking into it but it's mostly a miss configuration as others have tested it successfully.

Comment 9 umanga 2021-01-06 16:43:03 UTC

Flexible Scaling is enabled which forced Failure Domain to be Host instead of Zone.
So Arbiter Mode fails.

Disable Flexible Scaling and Arbiter should work fine.

Comment 17 errata-xmlrpc 2021-02-24 15:50:26 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.