Just to be certain, can you provide a screenshot of the problem in the UI? Also provide the full StorageCluster YAML and ocs-operator logs. An ocs-must-gather would also suffice. For the ocs-operator Pod, the readiness phase should not be impacted by the state of the StorageCluster to begin with. Installation of the operators is independent from the creation and management of its operands. I don't clearly remember how the previous behavior made it into the product, but really it's been a long-standing bug that (I believe) was recently cleared up as part of the SDK updates.
Oops, forgot to set the needinfo. Nonetheless, also giving devel_ack+ since I believe this problem is reproducible.
https://chat.google.com/room/AAAAREGEba8/2JSkNKg3_hY
(In reply to Jose A. Rivera from comment #5) > Just to be certain, can you provide a screenshot of the problem in the UI? > Also provide the full StorageCluster YAML and ocs-operator logs. An > ocs-must-gather would also suffice. > Hi Jose, apologies I do not have the screenshot of the UI. But the logs are provided here https://bugzilla.redhat.com/show_bug.cgi?id=1964574#c2 > For the ocs-operator Pod, the readiness phase should not be impacted by the > state of the StorageCluster to begin with. Installation of the operators is > independent from the creation and management of its operands. I don't > clearly remember how the previous behavior made it into the product, but > really it's been a long-standing bug that (I believe) was recently cleared > up as part of the SDK updates.
Description of problem ====================== When deployment of StorageCluster/ocs-storagecluster begins, it's phase immediately reaches "Ready" phase, even though the deployment just started at that point. Phase stays "Ready" during deployment of ceph components. Version-Release number of selected component ============================================ OCP 4.8.0-0.nightly-2021-06-03-055145 LSO 4.8.0-202106021817 OCS 4.8.0-407.ci How reproducible ================ 100% Steps to Reproduce ================== 1. Install OCP cluster. 2. Install LSO and OCS operators. 3. Use "Create Storage Cluster" wizard in OCP Console to initiate deployment of ocs-storagecluster. 4. Observe Phase of StorageCluster/ocs-storagecluster during installation (either via cli or via OCP Console). Actual results ============== When deployment of ocs-storagecluster starts, it's phase is "Ready": ``` $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 35s Ready 2021-06-04T19:12:04Z 4.8.0 ``` Even though ceph components are being installed at that moment. Only later when ceph deployment finishes and NooBaa installation is going on, we see status of ocs-storagecluster as Progressing: ``` $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 2m56s Progressing 2021-06-04T19:12:04Z 4.8.0 ``` And when this installation finishes, the phase is back at "Ready". Expected results ================ During deployment of ceph components, phase/state of ocs-storagecluster is reported as "Progressing", in the same way as done during NooBaa deployment.
Apologies if it was not clear, but giving devel_ack+ meant it's a valid bug that we should fix. Since it is marked as blocker? we need qa_ack+ to confirm it for 4.8. I had an initial look through the must-gather information and confirmed the problem, but the logs were not sufficient to do a full RCA. This needs further investigation to determine a proper resolution, it just hasn't been assigned yet.
Providing QA ack based on today's bug triage.
Rechecked with OCP/OCS 4.7: - OCP 4.7.0-0.nightly-2021-06-12-151209 - LSO 4.7.0-202105210300.p0 - OCS 4.7.1-410.ci And I don't see the problem I originally observed with 4.8 (as noted in comment 10).
Just deploy a cluster on BareMetal environment with OCP & OCS 4.8, and it pass without any problems. LSO version: 4.8.0-202106291913 ceph version 14.2.11-184.el8cp $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0 True False 11h Cluster version is 4.8.0 $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.0-450.ci OpenShift Container Storage 4.8.0-450.ci Succeeded $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 7m43s Ready 2021-07-13T07:47:21Z 4.8.0 $ oc get cephcluster -n openshift-storage NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 8m12s Ready Cluster created successfully HEALTH_OK IMO, this can be verified, unless some more test need to be done.
trying to deploy cluster (on the same OCP from #17) where the MON deployment is stuck show : $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.0-450.ci OpenShift Container Storage 4.8.0-450.ci Installing $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 12m Progressing 2021-07-13T10:37:28Z 4.8.0 $ oc get cephcluster -n openshift-storage NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 12m Progressing Configuring Ceph Mons while the ceph cluster is Installing (Progressing mode), the storagecluster is in Progressing mode and the OCS is in Installing as well I think that this BZ is verified.