Bug 2019946
| Summary: | CephCluster updates might result in infinite reconciles | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Sébastien Han <shan> |
| Component: | rook | Assignee: | Sébastien Han <shan> |
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Yosi Ben Shimon <ybenshim> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 4.9 | CC: | madam, muagarwa, ocs-bugs, odf-bz-bot, rperiyas, tnielsen |
| Target Milestone: | --- | ||
| Target Release: | ODF 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | v4.9.0-228.ci | Doc Type: | Bug Fix |
| Doc Text: |
The monitor list part of the cluster peer token secret was not sorted, so each time we were reconciling, the
peer secret token will see its content updated with randomized
monitors. This would enter our predicate and trigger a reconcile. Then the next reconcile would update the list again etc.
Potentially an endless one, if the randomized list is already different.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-01-07 17:46:31 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Sébastien Han
2021-11-03 16:53:00 UTC
downstream PR: https://github.com/red-hat-storage/rook/pull/312 To verify this BZ, you would really need to analyze the rook operator log to see if it is reconciling the cluster multiple times even while there are no changes to the cephcluster CR. For example: - Install OCS - Wait for the ceph daemons to be created, including the OSD pods - Wait for a few more minutes to ensure the operator is done - Grep the rook operator log for messages that indicate how many times the operator reconciled. You could grep for a specific message such as "done reconciling ceph cluster in namespace" that only occurs once per reconcile. - The reconcile should only occur once or maybe twice. If it's more than twice, the operator is finding a difference during each reconcile and keeps retrying when it should not. Following Travis steps from comment #6 on a 1 day old cluster: odf-operator.v4.9.0 From rook-ceph-operator logs there's only one occurrence of: 2021-11-21 07:36:28.279416 I | ceph-cluster-controller: done reconciling ceph cluster in namespace "openshift-storage" Moving to VERIFIED |