Bug 2260279 - Exporter deployment recreation with each Ceph daemon creation causes unnecessary reconciles
Summary: Exporter deployment recreation with each Ceph daemon creation causes unnecess...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: rook
Version: 4.15
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ODF 4.15.0
Assignee: Travis Nielsen
QA Contact: suchita
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-01-24 23:57 UTC by Travis Nielsen
Modified: 2024-03-19 15:32 UTC (History)
3 users (show)

Fixed In Version: 4.15.0-130
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2024-03-19 15:32:18 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github red-hat-storage rook pull 558 0 None open Bug 2260279: exporter: Skip reconcile on exporter deletion 2024-01-25 00:05:05 UTC
Github rook rook pull 13597 0 None Merged exporter: Skip reconcile on exporter deletion 2024-01-25 00:00:37 UTC
Red Hat Product Errata RHSA-2024:1383 0 None None None 2024-03-19 15:32:22 UTC

Description Travis Nielsen 2024-01-24 23:57:46 UTC
Description of problem (please be detailed as possible and provide log
snippests):


Version of all relevant components (if applicable):

4.14 and 4.15 affected

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?

No

Is there any workaround available to the best of your knowledge?

Just wait for the multiple reconciles to settle


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?

1

Can this issue reproducible?

Yes, review the operator log and see the reconcile running multiple times during a new install.


Can this issue reproduce from the UI?

Yes, any install should be affected


If this is a regression, please provide more details to justify this:

A regression since the Exporter daemon was implemented


Steps to Reproduce:
1. Install ODF
2. Review the Rook operator log
3. See the reconcile runs multiple times, unnecessarily

The log message indicating the issue is:

ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling


Actual results:

Rook reconcile runs multiple times instead of once

This was reported upstream in several issues such as [1]. 
The fix was finally tracked down and fixed upstream with [2].

[1] https://github.com/rook/rook/issues/12944
[2] https://github.com/rook/rook/pull/13597

Expected results:

Rook reconcile should only be necessary once on a new install or upgrade

While this issue may not be causing obvious health issues, it could be causing the operator to take longer than necessary to complete the reconciles. We need to fix this downstream as it is low risk and improves the initial experience by completing the reconcile faster.

Comment 2 Travis Nielsen 2024-01-25 00:00:38 UTC
Marking as a blocker for 4.15, we really should get this low-risk fix in to improve the initial configuration time.

Comment 3 Travis Nielsen 2024-01-25 00:05:05 UTC
Downstream PR opened for 4.15

Comment 9 suchita 2024-02-07 06:34:23 UTC
Verified on 4.15.0-134

$ oc get csv -n openshift-storage
NAME                                         DISPLAY                       VERSION             REPLACES   PHASE
mcg-operator.v4.15.0-134.stable              NooBaa Operator               4.15.0-134.stable              Succeeded
ocs-operator.v4.15.0-134.stable              OpenShift Container Storage   4.15.0-134.stable              Succeeded
odf-csi-addons-operator.v4.15.0-134.stable   CSI Addons                    4.15.0-134.stable              Succeeded
odf-operator.v4.15.0-134.stable              OpenShift Data Foundation     4.15.0-134.stable              Succeeded

$ oc get csv odf-operator.v4.15.0-134.stable -n openshift-storage -o yaml | grep full
    full_version: 4.15.0-134

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-02-06-040314   True        False         12h     Cluster version is 4.15.0-0.nightly-2024-02-06-040314

$ oc logs rook-ceph-operator-6d595bf69f-jkxxv -n openshift-storage | grep "rook operator image"
2024-02-07 05:26:31.064233 I | cephcmd: base ceph version inside the rook operator image is "ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable)"

$ oc logs rook-ceph-operator-6d595bf69f-jkxxv -n openshift-storage | grep "rook-ceph-exporter\" matched on delete, reconciling"
2024-02-07 05:31:30.874065 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling
2024-02-07 05:31:30.879334 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling

Comment 10 suchita 2024-02-07 08:34:39 UTC
I also checked on ODF uograde 4.14.5-5 

Observations:
----------------------------------------------------------------------

$ oc get csv 
NAME                                           DISPLAY                       VERSION               REPLACES                                PHASE
mcg-operator.v4.14.5-5.fusion-hci              NooBaa Operator               4.14.5-5.fusion-hci   mcg-operator.v4.13.7-rhodf              Succeeded
ocs-operator.v4.14.5-5.fusion-hci              OpenShift Container Storage   4.14.5-5.fusion-hci   ocs-operator.v4.13.7-rhodf              Succeeded
odf-csi-addons-operator.v4.14.5-5.fusion-hci   CSI Addons                    4.14.5-5.fusion-hci   odf-csi-addons-operator.v4.13.7-rhodf   Succeeded
odf-operator.v4.14.5-5.fusion-hci              OpenShift Data Foundation     4.14.5-5.fusion-hci   odf-operator.v4.13.7-rhodf              Succeeded

$ oc get csv -o yaml | grep full
      full_version: 4.14.5-5
      full_version: 4.14.5-5
      full_version: 4.14.5-5

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.14.0-0.nightly-2024-02-06-070712   True        False         15h     Error while reconciling 4.14.0-0.nightly-2024-02-06-070712: some cluster operators are not available

$ oc get pods| grep rook-ceph-operator
rook-ceph-operator-6865fffbf5-lqkg9                               1/1     Running             0          4h22m

$ oc logs rook-ceph-operator-6865fffbf5-lqkg9 -n openshift-storage | grep "rook operator image"
2024-02-07 04:19:50.024670 I | cephcmd: base ceph version inside the rook operator image is "ceph version 17.2.6-194.el9cp (d9f4aedda0fc0d99e7e0e06892a69523d2eb06dc) quincy (stable)"

$ oc logs rook-ceph-operator-6865fffbf5-lqkg9 -n openshift-storage | grep "rook-ceph-exporter\" matched on delete, reconciling"
2024-02-07 04:19:51.512748 I | ceph-spec: object "rook-ceph-exporter" matched on delete, reconciling
-------------------------------------------------------------------------------------

@Travis Nielsen WHat are the exact reproducer steps for this issue? 
Whether this fixe is backported in  ODF  4.14.5-5 and hence not reproduced?

Comment 11 Travis Nielsen 2024-02-07 20:48:44 UTC
The 4.14 fix has not yet been merged, it's just waiting approval: https://github.com/red-hat-storage/rook/pull/566

Comment 17 suchita 2024-02-13 11:46:30 UTC
Verified on 4.15.5-4-stable and found to be fixed 
------------------------------------------------------
$ oc get csv
NAME                                       DISPLAY                       VERSION           REPLACES                                PHASE
mcg-operator.v4.14.5-4.stable              NooBaa Operator               4.14.5-4.stable   mcg-operator.v4.14.4-rhodf              Succeeded
ocs-operator.v4.14.5-4.stable              OpenShift Container Storage   4.14.5-4.stable   ocs-operator.v4.14.4-rhodf              Succeeded
odf-csi-addons-operator.v4.14.5-4.stable   CSI Addons                    4.14.5-4.stable   odf-csi-addons-operator.v4.14.4-rhodf   Succeeded
odf-operator.v4.14.5-4.stable              OpenShift Data Foundation     4.14.5-4.stable   odf-operator.v4.14.4-rhodf              Succeeded

$ oc logs rook-ceph-operator-7c49768899-t4999 -n openshift-storage| grep "rook-ceph-exporter"
<No maatch found in a log>

$ oc get clusterversion

NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.15.0-0.nightly-2024-02-07-062935   True        False         4d2h    Cluster version is 4.15.0-0.nightly-2024-02-07-062935
--------------------------------------------------------------
Based on comment 15 and comment 16, marking this as Verified.

Comment 18 errata-xmlrpc 2024-03-19 15:32:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat OpenShift Data Foundation 4.15.0 security, enhancement, & bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2024:1383


Note You need to log in before you can comment on or make changes to this bug.