1913357 – ocs-operator should show error when flexible scaling and arbiter are both enabled at the same time

Bug 1913357 - ocs-operator should show error when flexible scaling and arbiter are both enabled at the same time

Summary: ocs-operator should show error when flexible scaling and arbiter are both ena...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenShift Container Storage
Classification:	Red Hat Storage
Component:	ocs-operator
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	OCS 4.7.0
Assignee:	Raghavendra Talur
QA Contact:	Oded
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1946595
TreeView+	depends on / blocked

Reported:	2021-01-06 15:05 UTC by Neha Berry
Modified:	2023-09-15 00:57 UTC (History)
CC List:	11 users (show)
Fixed In Version:	4.7.0-336.ci
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1946595 (view as bug list)
Environment:
Last Closed:	2021-05-19 09:17:47 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
UI screenshot (150.20 KB, image/png) 2021-01-06 15:05 UTC, Neha Berry	no flags	Details
arbiter not clickable on UI (install storage cluster) (192.66 KB, image/png) 2021-02-10 23:28 UTC, Oded	no flags	Details
arbiter warning ui (33.76 KB, image/png) 2021-03-25 08:47 UTC, Oded	no flags	Details
arbirter warning log on Persistent Storage Tab (95.83 KB, image/png) 2021-03-25 08:58 UTC, Oded	no flags	Details
arbiter warning on storagecluster page (113.96 KB, image/png) 2021-03-25 09:44 UTC, Oded	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift ocs-operator pull 1010	None	closed	Fail validation if arbiter and flexibleScaling both are enabled	2021-02-15 09:17:40 UTC
Github	openshift ocs-operator pull 1030	None	closed	Bug 1913357: [release-4.7] Fail validation if arbiter and flexibleScaling both are enabled	2021-02-15 09:17:40 UTC
Github	openshift ocs-operator pull 1061	None	closed	Bug 1913357: [release-4.7] storagecluster: Change event type from error to warning	2021-02-17 16:04:37 UTC
Github	openshift ocs-operator pull 1134	None	open	storagecluster: Change the phase when spec is not valid	2021-03-26 12:45:31 UTC
Github	openshift ocs-operator pull 1139	None	open	Bug 1913357: [release-4.7] storagecluster: Change the phase when spec is not valid	2021-03-29 11:31:13 UTC
Red Hat Product Errata	RHSA-2021:2041	None	None	None	2021-05-19 09:18:12 UTC

Description Neha Berry 2021-01-06 15:05:11 UTC

Created attachment 1744935 [details]
UI screenshot

Description of problem (please be detailed as possible and provide log
snippests):
==============================================================
Raising this BZ based on suggestion by Talur after a troubleshooting session on Bug 1913292.

In non-cloud platforms, on enabling arbiter, even flexible scaling gets enabled as zone count <3 (OSD nodes are distributed only on 2 zones, instead of regular 3)
Details
===========
In AWS, when zone>2, flexible scaling is set to false by default.

But in case of VMware LSO + arbiter mode, even though we added 3 zones (here us-east-2a and 2b for 2 for OSDs and us-east-2c for arbiter, the total zone count = 2 and UI enables the flexible scaling along with arbiter. Message in UI

>>When all the selected nodes in the storage class are in a single zone the cluster will be using a host-based failure domain

This results in conflict as arbiter is expecting failure domain based on zones and flexible scaling sets it on hostname

Snip from storagecluster.yaml

--
  spec:
    arbiter:
      enable: true
--
    externalStorage: {}
    flexibleScaling: true
    managedResources:
--
    nodeTopologies:
      arbiterLocation: us-east-2c


>> snip from Ceph cluster.yaml

  mon:
      count: 5
      stretchCluster:
        failureDomainLabel: kubernetes.io/hostname
        zones:
        - name: compute-1
        - name: compute-0
        - name: compute-4
        - name: compute-3
        - arbiter: true
          name: us-east-2c


Version of all relevant components (if applicable):
======================================================
OCP version 4.7.0-0.nightly-2021-01-05-220959
OCS version ocs-operator.v4.7.0-222.ci



Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
========================================================
yes arbiter install fails and we hit the deployment bug - 1913292

Is there any workaround available to the best of your knowledge?
==============================================================
Not sure

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
===========================================================
4

Can this issue reproducible?
==============================
Yes but tested only once

Can this issue reproduce from the UI?
=======================================
Yes

If this is a regression, please provide more details to justify this:
===================================================
New features.

Steps to Reproduce:
=======================

1:- Install OCP 4.7 and LSO operator (UI doesn’t support bringing up arbiter MON on Mater node yet)

2:- Label the nodes with .
topology.kubernetes.io/zone=us-east-2a and failure-domain.beta.kubernetes.io/zone=us-east-2a, see additional info for more details.

Note: Since the current OCS build does not have the new features, edited the CSV to add the following:
oc edit csv ocs-operator.v4.7.0-222.ci

Edit the enabled features to the following:
features.ocs.openshift.io/enabled: ‘[“kms”, “arbiter”, “flexible-scaling”]’

Install OCS operator 4.7.0-222.ci and click on create storage cluster
3. Select Internal -Attached mode

Sub-Steps
3a Discover Disks: -> Select Nodes: --> Select 2W nodes, each in say zone-A and zone-B (to bring up OSDs)
3b. Create Storage Class -> Provide name for SC and PVs will be created on the LSO disks
3c. Storage and the nodes -> Click on the checkbox to Enable Arbiter, select the arbiter zone (here zone: us-east-2c) and select the storageclass created in above step.
3d. Configure -> No change
3e. Review and create: review the selections and click create


Actual results:
==================
failureDomain is incorrectly set to kubernetes.io/hostname in arbiter install, as flexible scaling is set to true instead of false.

Expected results:
=========================
Flexible scaling should be set to true only if cluster is non-arbiter and zone count <3



Additional info:
=====================
Snippet from rook-operator pods:-

ceph-cluster-controller: reconciling ceph cluster in namespace "openshift-storage"
2021-01-06 12:50:01.118622 I | ceph-cluster-controller: clusterInfo not yet found, must be a new cluster
2021-01-06 12:50:01.129652 E | ceph-cluster-controller: failed to reconcile. failed to reconcile cluster "ocs-storagecluster-cephcluster": failed to configure local ceph cluster: failed to perform validation before cluster creation: expecting exactly three zones for the stretch cluster, but found 5



Additional info:

oc get nodes --show-labels                 
NAME              STATUS   ROLES    AGE     VERSION           LABELS
compute-0         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-0,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-1         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-1,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-2         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-2,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
compute-3         Ready    worker   6h39m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2a,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-3,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
compute-4         Ready    worker   6h37m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,cluster.ocs.openshift.io/openshift-storage=,failure-domain.beta.kubernetes.io/zone=us-east-2b,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-4,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
compute-5         Ready    worker   6h37m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,failure-domain.beta.kubernetes.io/zone=us-east-2c,kubernetes.io/arch=amd64,kubernetes.io/hostname=compute-5,kubernetes.io/os=linux,node-role.kubernetes.io/worker=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c
control-plane-0   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-0,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2a
control-plane-1   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-1,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2b
control-plane-2   Ready    master   6h46m   v1.20.0+8e0d026   beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=control-plane-2,kubernetes.io/os=linux,node-role.kubernetes.io/master=,node.openshift.io/os_id=rhcos,topology.kubernetes.io/zone=us-east-2c

Comment 5 Martin Bukatovic 2021-01-27 20:25:31 UTC

QE will try to deploy storage cluster CR as explained in the reproducer.

Comment 8 Oded 2021-02-10 23:24:04 UTC

Need to fix this log issue: 
"E0210 23:17:48.027136       1 event.go:334] Unsupported event type: 'Error'"


SetUp:
LSO Cluster
Provider: Vmware
OCP Version:4.7.0-0.nightly-2021-02-09-224509
OCS Version:ocs-operator.v4.7.0-257.ci


Test Procedure:
1.Install LSO cluster via UI with 2 zones
*There is not option to enable Aribter via UI (attached screenshot)
compute-0 and compute-1 on zone-a
oc label node compute-0 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a
oc label node compute-1 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a
compute-2 on zone-b
oc label node compute-2 failure-domain.beta.kubernetes.io/zone=b topology.kubernetes.io/zone=b

2.Get Storagecluster yaml file -> (flexibleScaling enable, arbiter disabled)
  spec:
    arbiter: {}
    encryption:
      kms: {}
    externalStorage: {}
    flexibleScaling: true
    managedResources:
      cephBlockPools: {}
      cephFilesystems: {}
      cephObjectStoreUsers: {}
      cephObjectStores: {}
    monDataDirHostPath: /var/lib/rook
    nodeTopologies: {}
    storageDeviceSets:
    - config: {}
      count: 3
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: "1"
          storageClassName: localblock
          volumeMode: Block
        status: {}
      name: ocs-deviceset-localblock
      placement: {}
      preparePlacement: {}
      replica: 1
      resources: {}
    version: 4.7.0
  
2.Enable arbiter
$ oc edit storagecluster -n openshift-storage
    arbiter:
      enable: true

3.Check logs on ocs-operator pod:

E0210 23:17:48.027136       1 event.go:334] Unsupported event type: 'Error'
{"level":"error","ts":1612999068.0271533,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/deps/gomod/pkg/mod/github.com/go-logr/zapr.0/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/deps/gomod/pkg/mod/sigs.k8s.io/controller-runtime.3/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/deps/gomod/pkg/mod/k8s.io/apimachinery.2/pkg/util/wait/wait.go:90"}
->
arbiter and flexibleScaling both can't be enabled

Need to fix this log: 
"E0210 23:17:48.027136       1 event.go:334] Unsupported event type: 'Error'"

Comment 9 Oded 2021-02-10 23:28:23 UTC

Created attachment 1756313 [details]
arbiter not clickable on UI (install storage cluster)

Comment 10 Nitin Goyal 2021-02-11 15:04:12 UTC

https://github.com/openshift/ocs-operator/pull/1060

Comment 12 Oded 2021-03-24 10:01:51 UTC

Bug Fixed

Test Procedure:

1.Install OCP 4.7 Cluster
Provider: VSphere
OCP Version: 4.7.0-0.nightly-2021-03-22-025559

2.Lable compute-0,compute-1 zone A, compute-2 zone B:
$ oc label node compute-0 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a
node/compute-0 labeled
$ oc label node compute-1 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a
node/compute-1 labeled
$ oc label node compute-2 failure-domain.beta.kubernetes.io/zone=b topology.kubernetes.io/zone=b
node/compute-2 labeled

3.Install Local Storage via UI
Local Storage Version: 4.7.0-202103060100.p0

4.Install OCS via UI
OCS Version: 4.7.0-307.ci

sh-4.4# ceph versions
{
    "mon": {
        "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 2
    },
    "rgw": {
        "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 1
    },
    "overall": {
        "ceph version 14.2.11-137.el8cp (3a312d9e77c6ce466c535d0de02128fded7ba51f) nautilus (stable)": 10
    }
}

5.Install Storage Cluster via UI

6.Enable arbiter on storagecluster
$ oc edit storagecluster -n openshift-storage
    arbiter:
      enable: true

$ oc get storagecluster -n openshift-storage -o yaml
apiVersion: v1
items:
- apiVersion: ocs.openshift.io/v1
  kind: StorageCluster
  metadata:
    annotations:
      cluster.ocs.openshift.io/local-devices: "true"
      uninstall.ocs.openshift.io/cleanup-policy: delete
      uninstall.ocs.openshift.io/mode: graceful
    creationTimestamp: "2021-03-24T08:24:18Z"
    finalizers:
    - storagecluster.ocs.openshift.io
    generation: 3
    managedFields:
    - apiVersion: ocs.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            .: {}
            f:cluster.ocs.openshift.io/local-devices: {}
        f:spec:
          .: {}
          f:arbiter: {}
          f:encryption:
            .: {}
            f:enable: {}
            f:kms: {}
          f:flexibleScaling: {}
          f:monDataDirHostPath: {}
          f:nodeTopologies: {}
      manager: Mozilla
      operation: Update
      time: "2021-03-24T08:24:18Z"
    - apiVersion: ocs.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:metadata:
          f:annotations:
            f:uninstall.ocs.openshift.io/cleanup-policy: {}
            f:uninstall.ocs.openshift.io/mode: {}
          f:finalizers: {}
        f:spec:
          f:externalStorage: {}
          f:managedResources:
            .: {}
            f:cephBlockPools: {}
            f:cephConfig: {}
            f:cephFilesystems: {}
            f:cephObjectStoreUsers: {}
            f:cephObjectStores: {}
          f:storageDeviceSets: {}
          f:version: {}
        f:status:
          .: {}
          f:conditions: {}
          f:failureDomain: {}
          f:failureDomainKey: {}
          f:failureDomainValues: {}
          f:images:
            .: {}
            f:ceph:
              .: {}
              f:actualImage: {}
              f:desiredImage: {}
            f:noobaaCore:
              .: {}
              f:actualImage: {}
              f:desiredImage: {}
            f:noobaaDB:
              .: {}
              f:actualImage: {}
              f:desiredImage: {}
          f:nodeTopologies:
            .: {}
            f:labels:
              .: {}
              f:failure-domain.beta.kubernetes.io/zone: {}
              f:kubernetes.io/hostname: {}
          f:phase: {}
          f:relatedObjects: {}
      manager: ocs-operator
      operation: Update
      time: "2021-03-24T08:25:57Z"
    - apiVersion: ocs.openshift.io/v1
      fieldsType: FieldsV1
      fieldsV1:
        f:spec:
          f:arbiter:
            f:enable: {}
      manager: kubectl-edit
      operation: Update
      time: "2021-03-24T09:39:24Z"
    name: ocs-storagecluster
    namespace: openshift-storage
    resourceVersion: "202669"
    selfLink: /apis/ocs.openshift.io/v1/namespaces/openshift-storage/storageclusters/ocs-storagecluster
    uid: 5d10c103-1ed6-4f76-b9c6-79ea8bdd7b68
  spec:
    arbiter:
      enable: true
    encryption:
      enable: true
      kms: {}
    externalStorage: {}
    flexibleScaling: true
    managedResources:
      cephBlockPools: {}
      cephConfig: {}
      cephFilesystems: {}
      cephObjectStoreUsers: {}
      cephObjectStores: {}
    monDataDirHostPath: /var/lib/rook
    nodeTopologies: {}
    storageDeviceSets:
    - config: {}
      count: 3
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: "1"
          storageClassName: localblock
          volumeMode: Block
        status: {}
      name: ocs-deviceset-localblock
      placement: {}
      preparePlacement: {}
      replica: 1
      resources: {}
    version: 4.7.0
  status:
    conditions:
    - lastHeartbeatTime: "2021-03-24T09:39:19Z"
      lastTransitionTime: "2021-03-24T08:24:20Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: ReconcileComplete
    - lastHeartbeatTime: "2021-03-24T09:39:19Z"
      lastTransitionTime: "2021-03-24T08:27:53Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: Available
    - lastHeartbeatTime: "2021-03-24T09:39:19Z"
      lastTransitionTime: "2021-03-24T08:27:53Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "False"
      type: Progressing
    - lastHeartbeatTime: "2021-03-24T09:39:19Z"
      lastTransitionTime: "2021-03-24T08:24:19Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "False"
      type: Degraded
    - lastHeartbeatTime: "2021-03-24T09:39:19Z"
      lastTransitionTime: "2021-03-24T08:27:53Z"
      message: Reconcile completed successfully
      reason: ReconcileCompleted
      status: "True"
      type: Upgradeable
    failureDomain: host
    failureDomainKey: kubernetes.io/hostname
    failureDomainValues:
    - compute-2
    - compute-0
    - compute-1
    images:
      ceph:
        actualImage: quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3
        desiredImage: quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3
      noobaaCore:
        actualImage: quay.io/rhceph-dev/mcg-core@sha256:54d2ea9d4e18f6c4bb1a11dfec741d1adb62c34d98ca4c488f9b06c070a794d3
        desiredImage: quay.io/rhceph-dev/rhceph@sha256:a334f5429bc9c5ff1175e616fd0c9d1765457ead727a036005125ba3747cc5b3
      noobaaDB:
        actualImage: registry.redhat.io/rhel8/postgresql-12@sha256:ed859e2054840467e9a0ffc310ddf74ff64a8743c236598ca41c7557d8cdc767
        desiredImage: registry.redhat.io/rhel8/postgresql-12@sha256:ed859e2054840467e9a0ffc310ddf74ff64a8743c236598ca41c7557d8cdc767
    nodeTopologies:
      labels:
        failure-domain.beta.kubernetes.io/zone:
        - a
        - b
        kubernetes.io/hostname:
        - compute-2
        - compute-0
        - compute-1
    phase: Ready
    relatedObjects:
    - apiVersion: ceph.rook.io/v1
      kind: CephCluster
      name: ocs-storagecluster-cephcluster
      namespace: openshift-storage
      resourceVersion: "202213"
      uid: 1f24c28a-de22-4d9d-a693-569cc6909337
    - apiVersion: noobaa.io/v1alpha1
      kind: NooBaa
      name: noobaa
      namespace: openshift-storage
      resourceVersion: "202631"
      uid: f6947698-684c-4d57-ba61-08eca6108726
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""




$ oc logs ocs-operator-64d77857dc-q7wk5
"arbiter and flexibleScaling both can't be enabled"


{"level":"error","ts":1616579562.1646447,"logger":"controllers.StorageCluster","msg":"Failed to validate ArbiterSpec","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).validateStorageClusterSpec\n\t/remote-source/app/controllers/storagecluster/reconcile.go:211\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:244\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1616579562.164746,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1616579567.9811654,"logger":"controllers.StorageCluster","msg":"Failed to validate ArbiterSpec","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).validateStorageClusterSpec\n\t/remote-source/app/controllers/storagecluster/reconcile.go:211\ngithub.com/openshift/ocs-operator/controllers/storagecluster.(*StorageClusterReconciler).Reconcile\n\t/remote-source/app/controllers/storagecluster/reconcile.go:153\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:244\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"}
{"level":"error","ts":1616579567.9812632,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"}

Comment 13 Oded 2021-03-25 08:46:13 UTC

Warning log on UI:
"arbiter and flexibleScaling both can't be enabled"

*Attached print screen

Comment 14 Oded 2021-03-25 08:47:08 UTC

Created attachment 1766229 [details]
arbiter warning ui

Comment 15 Oded 2021-03-25 08:58:21 UTC

Created attachment 1766234 [details]
arbirter warning log on  Persistent Storage Tab

Comment 16 Oded 2021-03-25 09:41:45 UTC

Hi Talur,
I agreed that with this fix, we see the message "arbiter and flexibleScaling both can't be enabled" in logs and Event tabs, but we are still able to set both arbiter and flexiblescaling as true in Storagecluster

a) Storagecluster is still in ready state, so how will users be indicated that what they did is not acceptable? 
b) they might not check the events or logs as Storagecluster state is still ready and our changes are accepted.

Comment 17 Oded 2021-03-25 09:44:12 UTC

Created attachment 1766246 [details]
arbiter warning on storagecluster page

Comment 18 Raghavendra Talur 2021-03-25 09:45:47 UTC

Nitin, please check if a status condition also needs to set/updated.

Comment 19 Nitin Goyal 2021-03-25 11:31:20 UTC

I have sent a PR to change the Status.Phase to Error state https://github.com/openshift/ocs-operator/pull/1134

Comment 20 Mudit Agarwal 2021-03-26 12:45:33 UTC

We have a PR to reflect the state, moving the BZ to POST again.

Comment 21 Oded 2021-04-04 08:32:23 UTC

Storage Cluster state is ready (arbiter and flexible Scaling enabled)

Test Procedure:
1.Install OCP 4.7 Cluster
Provider:VSphere
OCP Version: 4.7.0-0.nightly-2021-04-01-061355

2.Lable compute-0,compute-1 zone A, compute-2 zone B:
$ oc label node compute-0 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a
node/compute-0 labeled

$ oc label node compute-1 failure-domain.beta.kubernetes.io/zone=a topology.kubernetes.io/zone=a
node/compute-1 labeled

$ oc label node compute-2 failure-domain.beta.kubernetes.io/zone=b topology.kubernetes.io/zone=b
node/compute-2 labeled

3.Install Local Storage via UI
Local Storage Version: 4.7.0-202103202139.p0

4.Install OCS via UI
OCS Version: 4.7.0-339.ci

sh-4.4# ceph versions
{
    "mon": {
        "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 3
    },
    "mgr": {
        "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 1
    },
    "osd": {
        "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 3
    },
    "mds": {
        "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 2
    },
    "rgw": {
        "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 1
    },
    "overall": {
        "ceph version 14.2.11-143.el8cp (ab503edb1421ce443f12917d9a75d5b56334dfea) nautilus (stable)": 10
    }
}

5.Install Storage Cluster via UI

6.Enable arbiter on storagecluster
$ oc edit storagecluster -n openshift-storage
storagecluster.ocs.openshift.io/ocs-storagecluster edited

  spec:
    arbiter:
      enable: true


  spec:
    arbiter:
      enable: true
    encryption:
      enable: true
      kms: {}
    externalStorage: {}
    flexibleScaling: true
    managedResources:
      cephBlockPools: {}
      cephConfig: {}
      cephFilesystems: {}
      cephObjectStoreUsers: {}
      cephObjectStores: {}
    monDataDirHostPath: /var/lib/rook
    nodeTopologies: {}
    storageDeviceSets:
    - config: {}
      count: 3
      dataPVCTemplate:
        metadata: {}
        spec:
          accessModes:
          - ReadWriteOnce
          resources:
            requests:
              storage: "1"
          storageClassName: localblock
          volumeMode: Block
        status: {}
      name: ocs-deviceset-localblock
      placement: {}
      preparePlacement: {}
      replica: 1
      resources: {}
    version: 4.7.0

$ oc logs ocs-operator-5bcdd97ff4-mh6sp
{"level":"error","ts":1617524035.0535948,"logger":"controller","msg":"Reconciler error","reconcilerGroup":"ocs.openshift.io","reconcilerKind":"StorageCluster","controller":"storagecluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:246\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:218\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).worker\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:197\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.Until\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:90"}


for more details:
https://docs.google.com/document/d/1Ahu6qEIbaYOij3KO0fyHKrAAAmunjQZ-WuPsC8oULOE/edit

Comment 23 Mudit Agarwal 2021-04-05 10:50:43 UTC

Is this really a blocker? We are in RC phase now for 4.7 so we have to reassess all FailedQA.

Comment 24 Nitin Goyal 2021-04-05 11:19:53 UTC

(In reply to Mudit Agarwal from comment #23)
> Is this really a blocker? We are in RC phase now for 4.7 so we have to
> reassess all FailedQA.

As part of this bug, we have added the logs and events but did not change the PHASE of storageCluster. Changing PHASE of storageCluster comes as a request in comment 16. As we are already in the RC we can verify this bug and create a new one for changing the PHASE specifically. which can go in 4.7 async or later.

Comment 25 Mudit Agarwal 2021-04-05 13:35:59 UTC

Thanks Nitin.

Oded,
I agree with Nitin here, this is not a blocker. I am moving it back to ON_QA, please raise a new BZ for status and I will add it as a known issue for release notes.
We can fix it in 4.8 and bring to 4.7.z if required.

Comment 27 Mudit Agarwal 2021-04-06 12:18:44 UTC

Can we pls raise a bug for the same, so that I can add it to known issues and fill doc text for the same.

Comment 30 errata-xmlrpc 2021-05-19 09:17:47 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: Red Hat OpenShift Container Storage 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2041

Comment 31 Red Hat Bugzilla 2023-09-15 00:57:47 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.