Description of problem: https://prow.ci.openshift.org/job-history/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.9 is pretty consistently failing due to: fail [github.com/openshift/origin/test/extended/util/disruption/disruption.go:190]: Jul 1 16:24:28.261: Unexpected alerts fired or pending during the upgrade: alert ClusterMonitoringOperatorReconciliationErrors fired for 60 seconds with labels: {severity="warning"} example job: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/release-openshift-origin-installer-old-rhcos-e2e-aws-4.9/1410608470004076544 can be seen in ci-search that this is failing most of our "old-rhcos" job runs: https://search.ci.openshift.org/?search=ClusterMonitoringOperatorReconciliationErrors&maxAge=336h&context=1&type=bug%2Bjunit&name=release-openshift-origin-installer-old-rhcos-e2e-aws-4.9&excludeName=&maxMatches=5&maxBytes=20971520&groupBy=job Note that this job uses v4.8 nodes with v4.9 control plane, which may be part of the issue, but I also see there have been other bugs in this space that were marked resolved recently, e.g.: https://bugzilla.redhat.com/show_bug.cgi?id=1932624 Version-Release number of selected component (if applicable): 4.9 How reproducible: failing semi regularly. Actual results: Alert fires for 60s during upgrade Expected results: Alert should not fire during upgrade (we expect no warning alerts during upgrades) It may be necessary to configure this alert w/ a higher delay period before it fires, if there is not a fundamentally fixable flaw in the operator itself.
checked with 4.9.0-0.nightly-2021-07-12-143404, for clause is added - alert: ClusterMonitoringOperatorReconciliationErrors annotations: message: Cluster Monitoring Operator is experiencing unexpected reconciliation errors. Inspect the cluster-monitoring-operator log for potential root causes. expr: max_over_time(cluster_monitoring_operator_last_reconciliation_successful[5m]) == 0 for: 1h labels: severity: warning
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759