Bug 1978829 - ClusterMonitoringOperatorReconciliationErrors is firing during upgrades and should not be
Summary: ClusterMonitoringOperatorReconciliationErrors is firing during upgrades and s...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Monitoring
Version: 4.9
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.9.0
Assignee: Philip Gough
QA Contact: Junqi Zhao
Depends On:
Blocks: 1999148
TreeView+ depends on / blocked
Reported: 2021-07-02 21:15 UTC by Ben Parees
Modified: 2021-10-18 17:38 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1999148 (view as bug list)
Last Closed: 2021-10-18 17:38:01 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift cluster-monitoring-operator pull 1268 0 None open Bug 1978829: alert: ClusterMonitoringOperatorReconciliationErrors: reduce range du… 2021-07-06 19:08:31 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:38:22 UTC

Description Ben Parees 2021-07-02 21:15:16 UTC
Description of problem:


is pretty consistently failing due to:

fail [github.com/openshift/origin/test/extended/util/disruption/disruption.go:190]: Jul  1 16:24:28.261: Unexpected alerts fired or pending during the upgrade:

alert ClusterMonitoringOperatorReconciliationErrors fired for 60 seconds with labels: {severity="warning"}

example job:

can be seen in ci-search that this is failing most of our "old-rhcos" job runs:

Note that this job uses v4.8 nodes with v4.9 control plane, which may be part of the issue, but I also see there have been other bugs in this space that were marked resolved recently, e.g.:


Version-Release number of selected component (if applicable):

How reproducible:
failing semi regularly.

Actual results:
Alert fires for 60s during upgrade

Expected results:
Alert should not fire during upgrade (we expect no warning alerts during upgrades)

It may be necessary to configure this alert w/ a higher delay period before it fires, if there is not a fundamentally fixable flaw in the operator itself.

Comment 3 Junqi Zhao 2021-07-13 03:11:26 UTC
checked with 4.9.0-0.nightly-2021-07-12-143404, for clause is added

      - alert: ClusterMonitoringOperatorReconciliationErrors
          message: Cluster Monitoring Operator is experiencing unexpected reconciliation
            errors. Inspect the cluster-monitoring-operator log for potential root causes.
        expr: max_over_time(cluster_monitoring_operator_last_reconciliation_successful[5m])
          == 0
        for: 1h
          severity: warning

Comment 10 errata-xmlrpc 2021-10-18 17:38:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.