The compact jobs are failing frequently in CI: $ w3m -dump -cols 200 'https://search.ci.openshift.org/?maxAge=92h&type=junit&name=compact&search=ClusterOperatorDown.*authentication' | grep 'failures match' | sort periodic-ci-openshift-release-master-ci-4.8-e2e-aws-compact (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.8-e2e-aws-compact-serial (all) - 2 runs, 100% failed, 100% of failures match = 100% impact periodic-ci-openshift-release-master-ci-4.8-e2e-azure-compact-serial (all) - 2 runs, 100% failed, 50% of failures match = 50% impact periodic-ci-openshift-release-master-ci-4.8-e2e-gcp-compact-serial (all) - 2 runs, 50% failed, 100% of failures match = 50% impact Picking one of those [1], the only failing test case was: : [sig-instrumentation][Late] Alerts shouldn't report any alerts in firing or pending state apart from Watchdog and AlertmanagerReceiversNotConfigured and have no gaps in Watchdog firing [Suite:openshift/conformance/parallel] Run #0: Failed 8s fail [github.com/onsi/ginkgo.0-origin.0+incompatible/internal/leafnodes/runner.go:113]: Apr 6 07:33:02.223: Unexpected alerts fired or pending after the test run: alert ClusterOperatorDown fired for 1 seconds with labels: {endpoint="metrics", instance="10.0.200.205:9099", job="cluster-version-operator", name="authentication", namespace="openshift-cluster-version", pod="cluster-version-operator-b4756cb5f-kh6h8", service="cluster-version-operator", severity="critical", version="4.8.0-0.ci-2021-04-05-224633"} From [2], the timeline for that job is something like: * 6:59Z, Available=False with OAuthServerRouteEndpointAccessibleController_EndpointUnavailable::WellKnown_NotReady . Degraded=True with OAuthServerRouteEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError . * 7:04Z, Available False with WellKnown_NotReady , Degraded=True with WellKnownReadyController_SyncError . * 7:09Z, operator goes happy. [3] has 'Managed cluster should start all core operators' passing at 07:09:59Z, so possibly we need to keep the install-Progressing workaround a bit longer and put off the revert from [4]. Might also be related to (or a dup of) bug 1939580 or bug 1929922. [1]: https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-compact/1379322361316118528 [2]: https://promecieus.dptools.openshift.org/?search=https://prow.ci.openshift.org/view/gs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-compact/1379322361316118528 [3]: https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/origin-ci-test/logs/periodic-ci-openshift-release-master-ci-4.8-e2e-aws-compact/1379322361316118528/artifacts/e2e-aws-compact/openshift-e2e-test/artifacts/e2e-intervals.json [4]: https://github.com/openshift/cluster-authentication-operator/pull/423
Sounds like a duplicate of the other "ClusterOperatorIsDown" BZ, I'm going to close this in its favour. *** This bug has been marked as a duplicate of bug 1939580 ***