Bug 1964574 - OCS 4.8 Fresh deployment: Storagecluster in ready state even when Cpehcluster is stuck in Progressing (Configuring MONs) for prolonged time
Summary: OCS 4.8 Fresh deployment: Storagecluster in ready state even when Cpehcluster...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat OpenShift Container Storage
Classification: Red Hat Storage
Component: ocs-operator
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: OCS 4.8.0
Assignee: Nobody
QA Contact: Avi Liani
URL:
Whiteboard:
Depends On:
Blocks: 1951021
TreeView+ depends on / blocked
 
Reported: 2021-05-25 18:14 UTC by Neha Berry
Modified: 2025-04-12 08:28 UTC (History)
4 users (show)

Fixed In Version: 4.8.0-432.ci
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift ocs-operator pull 1225 0 None closed Don't report Ready when CephCluster has not reached ClusterStateCreated 2021-06-18 18:22:34 UTC
Github openshift ocs-operator pull 1232 0 None open Bug 1964574: [release-4.8] Don't report Ready when CephCluster has not reached ClusterStateCreated 2021-06-18 18:21:17 UTC

Comment 5 Jose A. Rivera 2021-05-27 16:35:10 UTC
Just to be certain, can you provide a screenshot of the problem in the UI? Also provide the full StorageCluster YAML and ocs-operator logs. An ocs-must-gather would also suffice.

For the ocs-operator Pod, the readiness phase should not be impacted by the state of the StorageCluster to begin with. Installation of the operators is independent from the creation and management of its operands. I don't clearly remember how the previous behavior made it into the product, but really it's been a long-standing bug that (I believe) was recently cleared up as part of the SDK updates.

Comment 6 Jose A. Rivera 2021-06-02 15:17:52 UTC
Oops, forgot to set the needinfo. Nonetheless, also giving devel_ack+ since I believe this problem is reproducible.

Comment 7 Mudit Agarwal 2021-06-02 15:18:41 UTC
https://chat.google.com/room/AAAAREGEba8/2JSkNKg3_hY

Comment 8 Neha Berry 2021-06-04 10:33:30 UTC
(In reply to Jose A. Rivera from comment #5)
> Just to be certain, can you provide a screenshot of the problem in the UI?
> Also provide the full StorageCluster YAML and ocs-operator logs. An
> ocs-must-gather would also suffice.
> 
Hi Jose, apologies I do not have the screenshot of the UI. But the logs are provided here
https://bugzilla.redhat.com/show_bug.cgi?id=1964574#c2

> For the ocs-operator Pod, the readiness phase should not be impacted by the
> state of the StorageCluster to begin with. Installation of the operators is
> independent from the creation and management of its operands. I don't
> clearly remember how the previous behavior made it into the product, but
> really it's been a long-standing bug that (I believe) was recently cleared
> up as part of the SDK updates.

Comment 10 Martin Bukatovic 2021-06-04 22:14:24 UTC
Description of problem
======================

When deployment of StorageCluster/ocs-storagecluster begins, it's phase
immediately reaches "Ready" phase, even though the deployment just started
at that point. Phase stays "Ready" during deployment of ceph components.

Version-Release number of selected component
============================================

OCP 4.8.0-0.nightly-2021-06-03-055145
LSO 4.8.0-202106021817
OCS 4.8.0-407.ci

How reproducible
================

100%

Steps to Reproduce
==================

1. Install OCP cluster.
2. Install LSO and OCS operators.
3. Use "Create Storage Cluster" wizard in OCP Console to initiate deployment of
   ocs-storagecluster.
4. Observe Phase of StorageCluster/ocs-storagecluster during installation
   (either via cli or via OCP Console).

Actual results
==============

When deployment of ocs-storagecluster starts, it's phase is "Ready":

```
$ oc get storagecluster -n openshift-storage
NAME                 AGE   PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   35s   Ready              2021-06-04T19:12:04Z   4.8.0
```

Even though ceph components are being installed at that moment.

Only later when ceph deployment finishes and NooBaa installation is going on,
we see status of ocs-storagecluster as Progressing:

```
$ oc get storagecluster -n openshift-storage
NAME                 AGE     PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   2m56s   Progressing              2021-06-04T19:12:04Z   4.8.0
```

And when this installation finishes, the phase is back at "Ready".

Expected results
================

During deployment of ceph components, phase/state of ocs-storagecluster is
reported as "Progressing", in the same way as done during NooBaa deployment.

Comment 12 Jose A. Rivera 2021-06-07 15:12:52 UTC
Apologies if it was not clear, but giving devel_ack+ meant it's a valid bug that we should fix. Since it is marked as blocker? we need qa_ack+ to confirm it for 4.8.

I had an initial look through the must-gather information and confirmed the problem, but the logs were not sufficient to do a full RCA. This needs further investigation to determine a proper resolution, it just hasn't been assigned yet.

Comment 13 Martin Bukatovic 2021-06-07 20:58:19 UTC
Providing QA ack based on today's bug triage.

Comment 16 Martin Bukatovic 2021-06-15 12:36:50 UTC
Rechecked with OCP/OCS 4.7:

- OCP 4.7.0-0.nightly-2021-06-12-151209
- LSO 4.7.0-202105210300.p0
- OCS 4.7.1-410.ci

And I don't see the problem I originally observed with 4.8 (as noted in comment 10).

Comment 17 Avi Liani 2021-07-13 08:02:52 UTC
Just deploy a cluster on BareMetal environment with OCP & OCS 4.8, and it pass without any problems.

LSO version: 4.8.0-202106291913 

ceph version 14.2.11-184.el8cp

$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0     True        False         11h     Cluster version is 4.8.0


$ oc get csv -n openshift-storage
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.8.0-450.ci   OpenShift Container Storage   4.8.0-450.ci              Succeeded


$ oc get storagecluster -n openshift-storage
NAME                 AGE     PHASE   EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   7m43s   Ready              2021-07-13T07:47:21Z   4.8.0


$ oc get cephcluster -n openshift-storage
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE     PHASE   MESSAGE                        HEALTH      EXTERNAL
ocs-storagecluster-cephcluster   /var/lib/rook     3          8m12s   Ready   Cluster created successfully   HEALTH_OK   


IMO, this can be verified, unless some more test need to be done.

Comment 18 Avi Liani 2021-07-13 10:55:03 UTC
trying to deploy cluster (on the same OCP from #17) where the MON deployment is stuck show :

$ oc get csv -n openshift-storage
NAME                         DISPLAY                       VERSION        REPLACES   PHASE
ocs-operator.v4.8.0-450.ci   OpenShift Container Storage   4.8.0-450.ci              Installing


$ oc get storagecluster -n openshift-storage
NAME                 AGE   PHASE         EXTERNAL   CREATED AT             VERSION
ocs-storagecluster   12m   Progressing              2021-07-13T10:37:28Z   4.8.0


$ oc get cephcluster -n openshift-storage
NAME                             DATADIRHOSTPATH   MONCOUNT   AGE   PHASE         MESSAGE                 HEALTH   EXTERNAL
ocs-storagecluster-cephcluster   /var/lib/rook     3          12m   Progressing   Configuring Ceph Mons            


while the ceph cluster is Installing (Progressing mode), the storagecluster is in Progressing mode and the OCS is in Installing as well

I think that this BZ is verified.


Note You need to log in before you can comment on or make changes to this bug.