Bug 1932624

Summary: ClusterMonitoringOperatorReconciliationErrors is pending at the end of an upgrade and probably should not be
Product: OpenShift Container Platform Reporter: Clayton Coleman <ccoleman>
Component: MonitoringAssignee: Sergiusz Urbaniak <surbania>
Status: CLOSED ERRATA QA Contact: Junqi Zhao <juzhao>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.8CC: alegrand, anpicker, erooth, hongkliu, hongyli, kakkoyun, lcosic, pkrupa, wking
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
[sig-arch] Check if alerts are firing during or after upgrade success
Last Closed: 2021-07-27 22:48:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Clayton Coleman 2021-02-24 19:18:01 UTC
Reconciliation errors should be truly exceptional, not "normal" during a rollout.

We are trying to eliminate sources of noise in upgrades by targeting alerts that fire or are pending at the end of the run.  By tightening these tests, teams will have clear indicators they are introducing potential alert noise.

https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/25904/pull-ci-openshift-origin-master-e2e-gcp-upgrade/1364383982690504704

demonstrates this for the ClusterMonitoringOperatorReconciliationErrors which is pending 1m after upgrade is complete. I would except reconciliation to not be pending because CMO should handle normal disruption errors silently and other components should not disrupt CMO during upgrade (I.e. control plane). I *suspect* this is because of the known GCP issue where some API requests are disrupted, so feel free to blame this on https://bugzilla.redhat.com/show_bug.cgi?id=1925698 for now.

I'm filing this so I have a record in the skip in the test for the allowlist of exceptions.

Comment 1 W. Trevor King 2021-03-19 23:07:04 UTC
*** Bug 1940933 has been marked as a duplicate of this bug. ***

Comment 2 W. Trevor King 2021-03-19 23:07:57 UTC
I'm making it easier for Sippy to find this bug by mentioning the relevant test-case.

Comment 9 errata-xmlrpc 2021-07-27 22:48:13 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438