Bug 1930466
| Summary: | OCS-Operator waiting on ceph cluster to initialize before starting noobaa. | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Container Storage | Reporter: | scott2 |
| Component: | ocs-operator | Assignee: | umanga <uchapaga> |
| Status: | CLOSED NEXTRELEASE | QA Contact: | Raz Tamir <ratamir> |
| Severity: | medium | Docs Contact: | |
| Priority: | unspecified | ||
| Version: | 4.6 | CC: | madam, muagarwa, ocs-bugs, scott2, sostapov, uchapaga |
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Linux | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-06-01 10:14:22 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
scott2
2021-02-18 23:29:23 UTC
I'll add that the rook-ceph-mon-a/b are in pending status (as they were on the failed nodes). rook-ceph-mon-c is in healthy status. rook-ceph-mon-a-6bbfbf5999-ddmgm 0/1 Pending 0 143m rook-ceph-mon-b-755568566d-4sgsj 0/1 Pending 0 24h rook-ceph-mon-c-5cf658f954-mfr2r 1/1 Running 13 14d But normally the nodeSelector on those deployments are managed by ocs-operator...which is waiting for ceph to finish initializing... This is circular... This is all expected behavior. The whole ConfigMap lock deletion thing is a known problem that will be resolved for OCS 4.7 (it's something out of our control, it's part of the framework we're using and we're upgrading it for OCS 4.7). As far as the ocs-operator not being Ready, this is intentional as the operator should not report Ready until all StorageClusters (and their components) are healthy. And indeed, NooBaa should not be created until the CephCluster is healthy, since NooBaa relies on a Ceph volume for its operation. You should inspect the CephCluster CR and the rook-ceph-operator logs to determine what is actually going on. Dealing with failed nodes in Kuberenetes is a pain in general. Pods will remain Pending and/or Terminated until either the exact node comes back healthy or the admin intervenes. In this case, you probably have to force delete any stuck Pods. We're considering ways to address this, but nothing is available for OCS 4.6. Since this is not a crucial bug, moving to OCS 4.8. Starting from OCS 4.7 we do not use configmap locks. The operator readiness and waiting for CephCluster before creating NooBaa is working as expected. So this bug doesn't exist anymore. Is it critical enough to have a 4.6 only fix? If not, we can close this. |