Bug 1964574
| Summary: | OCS 4.8 Fresh deployment: Storagecluster in ready state even when Cpehcluster is stuck in Progressing (Configuring MONs) for prolonged time | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | Neha Berry <nberry> |
| Component: | ocs-operator | Assignee: | Nobody <nobody> |
| Status: | VERIFIED --- | QA Contact: | Avi Liani <alayani> |
| Severity: | high | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.8 | CC: | mbukatov, muagarwa, owasserm, rperiyas, sostapov |
| Target Milestone: | --- | Keywords: | AutomationBackLog, Regression |
| Target Release: | OCS 4.8.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | 4.8.0-432.ci | Doc Type: | No Doc Update |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1951021 | ||
|
Comment 5
Jose A. Rivera
2021-05-27 16:35:10 UTC
Oops, forgot to set the needinfo. Nonetheless, also giving devel_ack+ since I believe this problem is reproducible. (In reply to Jose A. Rivera from comment #5) > Just to be certain, can you provide a screenshot of the problem in the UI? > Also provide the full StorageCluster YAML and ocs-operator logs. An > ocs-must-gather would also suffice. > Hi Jose, apologies I do not have the screenshot of the UI. But the logs are provided here https://bugzilla.redhat.com/show_bug.cgi?id=1964574#c2 > For the ocs-operator Pod, the readiness phase should not be impacted by the > state of the StorageCluster to begin with. Installation of the operators is > independent from the creation and management of its operands. I don't > clearly remember how the previous behavior made it into the product, but > really it's been a long-standing bug that (I believe) was recently cleared > up as part of the SDK updates. Description of problem ====================== When deployment of StorageCluster/ocs-storagecluster begins, it's phase immediately reaches "Ready" phase, even though the deployment just started at that point. Phase stays "Ready" during deployment of ceph components. Version-Release number of selected component ============================================ OCP 4.8.0-0.nightly-2021-06-03-055145 LSO 4.8.0-202106021817 OCS 4.8.0-407.ci How reproducible ================ 100% Steps to Reproduce ================== 1. Install OCP cluster. 2. Install LSO and OCS operators. 3. Use "Create Storage Cluster" wizard in OCP Console to initiate deployment of ocs-storagecluster. 4. Observe Phase of StorageCluster/ocs-storagecluster during installation (either via cli or via OCP Console). Actual results ============== When deployment of ocs-storagecluster starts, it's phase is "Ready": ``` $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 35s Ready 2021-06-04T19:12:04Z 4.8.0 ``` Even though ceph components are being installed at that moment. Only later when ceph deployment finishes and NooBaa installation is going on, we see status of ocs-storagecluster as Progressing: ``` $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 2m56s Progressing 2021-06-04T19:12:04Z 4.8.0 ``` And when this installation finishes, the phase is back at "Ready". Expected results ================ During deployment of ceph components, phase/state of ocs-storagecluster is reported as "Progressing", in the same way as done during NooBaa deployment. Apologies if it was not clear, but giving devel_ack+ meant it's a valid bug that we should fix. Since it is marked as blocker? we need qa_ack+ to confirm it for 4.8. I had an initial look through the must-gather information and confirmed the problem, but the logs were not sufficient to do a full RCA. This needs further investigation to determine a proper resolution, it just hasn't been assigned yet. Providing QA ack based on today's bug triage. Rechecked with OCP/OCS 4.7: - OCP 4.7.0-0.nightly-2021-06-12-151209 - LSO 4.7.0-202105210300.p0 - OCS 4.7.1-410.ci And I don't see the problem I originally observed with 4.8 (as noted in comment 10). Just deploy a cluster on BareMetal environment with OCP & OCS 4.8, and it pass without any problems. LSO version: 4.8.0-202106291913 ceph version 14.2.11-184.el8cp $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0 True False 11h Cluster version is 4.8.0 $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.0-450.ci OpenShift Container Storage 4.8.0-450.ci Succeeded $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 7m43s Ready 2021-07-13T07:47:21Z 4.8.0 $ oc get cephcluster -n openshift-storage NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 8m12s Ready Cluster created successfully HEALTH_OK IMO, this can be verified, unless some more test need to be done. trying to deploy cluster (on the same OCP from #17) where the MON deployment is stuck show : $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE ocs-operator.v4.8.0-450.ci OpenShift Container Storage 4.8.0-450.ci Installing $ oc get storagecluster -n openshift-storage NAME AGE PHASE EXTERNAL CREATED AT VERSION ocs-storagecluster 12m Progressing 2021-07-13T10:37:28Z 4.8.0 $ oc get cephcluster -n openshift-storage NAME DATADIRHOSTPATH MONCOUNT AGE PHASE MESSAGE HEALTH EXTERNAL ocs-storagecluster-cephcluster /var/lib/rook 3 12m Progressing Configuring Ceph Mons while the ceph cluster is Installing (Progressing mode), the storagecluster is in Progressing mode and the OCS is in Installing as well I think that this BZ is verified. |