Bug 1946595

Summary: ocs-storagecluster phase is "Ready" when flexible scaling and arbiter are both enabled
Product: [Red Hat Storage] Red Hat OpenShift Container Storage Reporter: Oded <oviner>
Component: ocs-operatorAssignee: Nitin Goyal <nigoyal>
Status: CLOSED ERRATA QA Contact: Oded <oviner>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: ebenahar, edonnell, jarrpa, madam, mbukatov, muagarwa, nberry, nigoyal, ocs-bugs, olakra, oviner, rtalur, sostapov, uchapaga
Target Milestone: ---Keywords: AutomationBackLog
Target Release: OCS 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
.Arbiter and flexible scaling can't be enabled at the same time When arbiter and flexible scaling both are enabled, the storage cluster was shown in `READY` state even though there were logs or messages with the error `arbiter and flexibleScaling both can't be enabled`. This was happening because of the incorrect specs of the storage cluster CR. With this update, the storage cluster is in "ERROR" state with the correct error message.
Story Points: ---
Clone Of: 1913357 Environment:
Last Closed: 2021-08-03 18:15:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1913357    
Bug Blocks: 1938134    

Comment 3 Mudit Agarwal 2021-04-06 14:03:07 UTC
Nitin, please add doc text for this.

Comment 9 Mudit Agarwal 2021-06-01 11:02:26 UTC
Doc text needs to be modified as we have fixed this issue now.

Comment 11 Oded 2021-06-10 12:28:55 UTC
Need to test it again because monitoring issue on my cluster

SetUp:
OCP Version:4.8.0-0.nightly-2021-06-09-065137
OCS Version: ocs-operator.v4.8.0-413.ci
LSO version:4.7.0-202102110027.p0
Provider: Vmware


Test Procedure:
1.Install OCS4.8  Cluster (LSO)

2.check storage cluster status
$ oc get storagecluster
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   18h   Ready              2021-06-09T16:35:09Z   4.8.0

$ oc get storagecluster -o yaml | grep flex
          f:flexibleScaling: {}
    flexibleScaling: true

$ oc get storagecluster -o yaml | grep arbiter
          f:arbiter: {}
    arbiter: {}

3.Enable arbiter:
spec:
  arbiter: 
   enable: true

4.Check ocs-operator log:
$ oc logs ocs-operator-dd57fd889-6zj8j
{"level":"error","ts":1623325972.242787,"logger":"controller-runtime.manager.controller.storagecluster","msg":"Reconciler error","reconciler group":"ocs.openshift.io","reconciler kind":"StorageCluster","name":"ocs-storagecluster","namespace":"openshift-storage","error":"arbiter and flexibleScaling both can't be enabled","stacktrace":"github.com/go-logr/zapr.(*zapLogger).Error\n\t/remote-source/app/vendor/github.com/go-logr/zapr/zapr.go:132\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:297\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:248\nsigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func1.1\n\t/remote-source/app/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:211\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil.func1\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:155\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:156\nk8s.io/apimachinery/pkg/util/wait.JitterUntil\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:133\nk8s.io/apimachinery/pkg/util/wait.JitterUntilWithContext\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:185\nk8s.io/apimachinery/pkg/util/wait.UntilWithContext\n\t/remote-source/app/vendor/k8s.io/apimachinery/pkg/util/wait/wait.go:99"}

5.New Warning on Console "arbiter and flexibleScaling both can't be enabled"

6.Check storagecluster status:
$ oc get storagecluster
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   19h   Error              2021-06-09T16:35:09Z   4.8.0

$ oc describe storagecluster
Events:
  Type     Reason            Age                   From                       Message
  ----     ------            ----                  ----                       -------
  Warning  FailedValidation  53s (x24 over 9m17s)  controller_storagecluster  arbiter and flexibleScaling both can't be enabled
  
$ oc get csv -A
NAMESPACE                              NAME                                           DISPLAY                       VERSION                 REPLACES                     PHASE
openshift-local-storage                local-storage-operator.4.7.0-202102110027.p0   Local Storage                 4.7.0-202102110027.p0                                Succeeded
openshift-operator-lifecycle-manager   packageserver                                  Package Server                0.17.0                                               Succeeded
openshift-storage                      ocs-operator.v4.8.0-413.ci                     OpenShift Container Storage   4.8.0-413.ci            ocs-operator.v4.8.0-411.ci   Succeeded

$ oc get cephcluster
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE   MESSAGE                        HEALTH      EXTERNAL
ocs-storagecluster-cephcluster   /var/lib/rook     3          19h   Ready   Cluster created successfully   HEALTH_OK   

7.Disable Arbiter on storagecluster yaml file:
$ oc edit storagecluster -n openshift-storage
spec:
  arbiter: {}

8.Check storagecluster status:
$ oc get storagecluster
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   19h   Ready              2021-06-09T16:35:09Z   4.8.0


for more deatis:
https://docs.google.com/document/d/1Ahu6qEIbaYOij3KO0fyHKrAAAmunjQZ-WuPsC8oULOE/edit

Comment 12 Oded 2021-06-10 14:58:42 UTC
All error messages are visible in the UI. [arbiter and flexibleScaling both can't be enabled]

for more deatis:
https://docs.google.com/document/d/1Ahu6qEIbaYOij3KO0fyHKrAAAmunjQZ-WuPsC8oULOE/edit

Comment 13 Olive Lakra 2021-07-09 05:18:36 UTC
Hi Mudit - please review the revised doc text and share feedback.

Comment 14 Mudit Agarwal 2021-07-09 07:43:26 UTC
This needs to be changed to:

.Arbiter and flexible scaling can't be enabled at the same time.
When arbiter and flexible scaling both are enabled, the storage cluster was shown in `READY` state even though there were logs or messages with the error `arbiter and flexibleScaling both can't be enabled`.
This was happening because of the incorrect specs of the storage cluster CR.
With this update, storage cluster is in "ERROR" state with the correct error message.

Comment 16 errata-xmlrpc 2021-08-03 18:15:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenShift Container Storage 4.8.0 container images bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3003

Comment 17 Red Hat Bugzilla 2023-09-15 01:04:42 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days