Description of problem (please be detailed as possible and provide log snippests): Version of all relevant components (if applicable): 4.14 and 4.15 affected Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? No Is there any workaround available to the best of your knowledge? Just wait for the multiple reconciles to settle Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 1 Can this issue reproducible? Yes, review the operator log and see the reconcile running multiple times during a new install. Can this issue reproduce from the UI? Yes, any install should be affected If this is a regression, please provide more details to justify this: A regression since the Exporter daemon was implemented Steps to Reproduce: 1. Install ODF 2. Review the Rook operator log 3. See the reconcile runs multiple times, unnecessarily The log message indicating the issue is: ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling Actual results: Rook reconcile runs multiple times instead of once This was reported upstream in several issues such as [1]. The fix was finally tracked down and fixed upstream with [2]. [1] https://github.com/rook/rook/issues/12944 [2] https://github.com/rook/rook/pull/13597 Expected results: Rook reconcile should only be necessary once on a new install or upgrade While this issue may not be causing obvious health issues, it could be causing the operator to take longer than necessary to complete the reconciles. We need to fix this downstream as it is low risk and improves the initial experience by completing the reconcile faster.
Marking as a blocker for 4.15, we really should get this low-risk fix in to improve the initial configuration time.
Downstream PR opened for 4.15
Verified on 4.15.0-134 $ oc get csv -n openshift-storage NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.15.0-134.stable NooBaa Operator 4.15.0-134.stable Succeeded ocs-operator.v4.15.0-134.stable OpenShift Container Storage 4.15.0-134.stable Succeeded odf-csi-addons-operator.v4.15.0-134.stable CSI Addons 4.15.0-134.stable Succeeded odf-operator.v4.15.0-134.stable OpenShift Data Foundation 4.15.0-134.stable Succeeded $ oc get csv odf-operator.v4.15.0-134.stable -n openshift-storage -o yaml | grep full full_version: 4.15.0-134 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.0-0.nightly-2024-02-06-040314 True False 12h Cluster version is 4.15.0-0.nightly-2024-02-06-040314 $ oc logs rook-ceph-operator-6d595bf69f-jkxxv -n openshift-storage | grep "rook operator image" 2024-02-07 05:26:31.064233 I | cephcmd: base ceph version inside the rook operator image is "ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable)" $ oc logs rook-ceph-operator-6d595bf69f-jkxxv -n openshift-storage | grep "rook-ceph-exporter\" matched on delete, reconciling" 2024-02-07 05:31:30.874065 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling 2024-02-07 05:31:30.879334 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling
I also checked on ODF uograde 4.14.5-5 Observations: ---------------------------------------------------------------------- $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.14.5-5.fusion-hci NooBaa Operator 4.14.5-5.fusion-hci mcg-operator.v4.13.7-rhodf Succeeded ocs-operator.v4.14.5-5.fusion-hci OpenShift Container Storage 4.14.5-5.fusion-hci ocs-operator.v4.13.7-rhodf Succeeded odf-csi-addons-operator.v4.14.5-5.fusion-hci CSI Addons 4.14.5-5.fusion-hci odf-csi-addons-operator.v4.13.7-rhodf Succeeded odf-operator.v4.14.5-5.fusion-hci OpenShift Data Foundation 4.14.5-5.fusion-hci odf-operator.v4.13.7-rhodf Succeeded $ oc get csv -o yaml | grep full full_version: 4.14.5-5 full_version: 4.14.5-5 full_version: 4.14.5-5 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.14.0-0.nightly-2024-02-06-070712 True False 15h Error while reconciling 4.14.0-0.nightly-2024-02-06-070712: some cluster operators are not available $ oc get pods| grep rook-ceph-operator rook-ceph-operator-6865fffbf5-lqkg9 1/1 Running 0 4h22m $ oc logs rook-ceph-operator-6865fffbf5-lqkg9 -n openshift-storage | grep "rook operator image" 2024-02-07 04:19:50.024670 I | cephcmd: base ceph version inside the rook operator image is "ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable)" $ oc logs rook-ceph-operator-6865fffbf5-lqkg9 -n openshift-storage | grep "rook-ceph-exporter\" matched on delete, reconciling" 2024-02-07 04:19:51.512748 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling ------------------------------------------------------------------------------------- @Travis Nielsen WHat are the exact reproducer steps for this issue? Whether this fixe is backported in ODF 4.14.5-5 and hence not reproduced?
The 4.14 fix has not yet been merged, it's just waiting approval: https://github.com/red-hat-storage/rook/pull/566
Verified on 4.15.5-4-stable and found to be fixed ------------------------------------------------------ $ oc get csv NAME DISPLAY VERSION REPLACES PHASE mcg-operator.v4.14.5-4.stable NooBaa Operator 4.14.5-4.stable mcg-operator.v4.14.4-rhodf Succeeded ocs-operator.v4.14.5-4.stable OpenShift Container Storage 4.14.5-4.stable ocs-operator.v4.14.4-rhodf Succeeded odf-csi-addons-operator.v4.14.5-4.stable CSI Addons 4.14.5-4.stable odf-csi-addons-operator.v4.14.4-rhodf Succeeded odf-operator.v4.14.5-4.stable OpenShift Data Foundation 4.14.5-4.stable odf-operator.v4.14.4-rhodf Succeeded $ oc logs rook-ceph-operator-7c49768899-t4999 -n openshift-storage| grep "rook-ceph-exporter" <No maatch found in a log> $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.15.0-0.nightly-2024-02-07-062935 True False 4d2h Cluster version is 4.15.0-0.nightly-2024-02-07-062935 -------------------------------------------------------------- Based on comment 15 and comment 16, marking this as Verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2024:1383