Bug 1984735
Summary: | [External Mode] Monitoring spec is getting reset in CephCluster resource | ||
---|---|---|---|
Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Sidhant Agrawal <sagrawal> |
Component: | ocs-operator | Assignee: | arun kumar mohan <amohan> |
Status: | CLOSED ERRATA | QA Contact: | Sidhant Agrawal <sagrawal> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.8 | CC: | amohan, ebenahar, kbg, madam, muagarwa, nberry, ocs-bugs, odf-bz-bot, rperiyas, sostapov, uchapaga |
Target Milestone: | --- | Keywords: | AutomationBackLog |
Target Release: | ODF 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
.Monitoring spec is getting reset in CephCluster resource in external mode
Previously, when OpenShift Container Storage was upgraded, the monitoring endpoints would get reset in external CephCluster's monitoring spec. This was not an expected behavior and was due to the way monitoring endpoints were passed to the CephCluster.
With this update, the way endpoints are passed is changed. Before the CephCluster is created, the endpoints are accessed directly from the JSON secret, `rook-ceph-external-cluster-details` and the CephCluster spec is updated. As a result, the monitoring endpoint specs in the CephCluster is updated properly with appropriate values even after the OpenShift Container Storage upgrade.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2021-12-13 17:44:54 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1966894, 2011326 |
Description
Sidhant Agrawal
2021-07-22 04:42:23 UTC
RCA (TLDR): ---------- An external cluster JSON is read, by the OCS-Operator, only when the checksum hash (a unique hexa string) of the JSON content is NOT same with the hash content in storagecluster.spec.externalSecretHash. For the first time, when OCS-Operator starts, the hash is empty in storagecluster.spec.externalSecretHash. Since the hash of the JSON content and the existing externalSecretHash won't match, it will create all resources and [important] update the externalSecretHash storagecluster spec. From the next time/reconcile onwards, both the hashes (JSON input's and spec's) match and there won't be any (repeated) creation external resources. *restart* When there is a OCS-Operator restart, storagecluster instance still has the same externalSecretHash spec which matches exactly with the JSON input. Thus new external resources won't be created (as the hashes matches). Why CephCluster's monitoring spec is not being updated? The way we pass the monitoring details from the JSON input to the CephCluster's monitoring spec is through StorageCluster's Reconciler object (not through any persistent resource like ConfigMaps or Secrets). On the restart of OCS-Operator, reconciler object is fresh and it won't have the monitoring values. You could see from the above *restart* reference, hashes are same and we won't get into the external resources creation part. Workaround: ----------- In progress FIX ------- Need to fix the way monitoring details are being passed to CephCluster spec. Instead of passing it through the internal Reconciler object, we should populate the monitoring details through a ConfigMap and allow the CephCluster to access it. A WIP PR is pushed: https://github.com/openshift/ocs-operator/pull/1284 Will remove the WIP tag, once we test the changes in Sidhant's external cluster. Summary: If the ocs-operator restarts, monitoring spec becomes empty but we are able to update the monitoring end point as well as we don't see any issues with the UI monitoring pages. Anmol, Sidhant and Arun had look at the latest external cluster and didn't see any impact in the UI functioning. We need to document that the customer should update the secret after upgrading from 4.7 to 4.8 so that the details of all endpoints are updated in the secret. This is to avoid any problem in future raised due to difference in the number of endpoints in fresh vs upgraded clusters. Sidhant will raise a doc bug for this. Details: https://chat.google.com/room/AAAAREGEba8/QWUHDIkEvW8 Updated the doc_text with some additional info. Please check. Changed the logic a bit differently. Instead of adding an extra resource, 'ConfigMap', directly reading the Monitoring details from the external cluster secret. PR: https://github.com/openshift/ocs-operator/pull/1285 Please test with the latest build. Arun, please add the doc text Updated the docs, please check. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: Red Hat OpenShift Data Foundation 4.9.0 enhancement, security, and bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:5086 |