Reconciliation errors should be truly exceptional, not "normal" during a rollout.
We are trying to eliminate sources of noise in upgrades by targeting alerts that fire or are pending at the end of the run. By tightening these tests, teams will have clear indicators they are introducing potential alert noise.
demonstrates this for the ClusterMonitoringOperatorReconciliationErrors which is pending 1m after upgrade is complete. I would except reconciliation to not be pending because CMO should handle normal disruption errors silently and other components should not disrupt CMO during upgrade (I.e. control plane). I *suspect* this is because of the known GCP issue where some API requests are disrupted, so feel free to blame this on https://bugzilla.redhat.com/show_bug.cgi?id=1925698 for now.
I'm filing this so I have a record in the skip in the test for the allowlist of exceptions.
*** Bug 1940933 has been marked as a duplicate of this bug. ***
I'm making it easier for Sippy to find this bug by mentioning the relevant test-case.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.