Created attachment 1764281 [details] must gather from cluster Description of problem: Recently we are seeing the failure on our e2e-crc jobs for installer repo and we tried it to test it manually and experienced same issue, looks like 4.8.0 nightly were perfectly working for this job 2 days back and suddenly started failing. Successful run: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4700/pull-ci-openshift-installer-master-e2e-crc/1371721481884536832 Failure run: https://prow.ci.openshift.org/view/gs/origin-ci-test/pr-logs/pull/openshift_installer/4766/pull-ci-openshift-installer-master-e2e-crc/1372208982986330112 Version-Release number of selected component (if applicable): ``` $ ./openshift-baremetal-install version ./openshift-baremetal-install 4.8.0-0.nightly-2021-03-18-000857 built from commit f8a81655daaa0a21c917c671f1dce9733e14c6f2 release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:5be3b251ccd17fae881d43591dd1bebe763780f0c7e3386332722ccb2648954d ``` How reproducible: Try to run openshift with single node on libvirt provider, bootstrap happen successfully but cluster doesn't provision successfully. Steps to Reproduce: 1. Use latest `openshift-baremetal-install` binary 2. Chose `libvirt` provider Actual results: ``` DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, baremetal, console, image-registry, openshift-samples ERROR Cluster operator authentication Degraded is True with OAuthRouteCheckEndpointAccessibleController_SyncError: OAuthRouteCheckEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps-crc.testing/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) INFO Cluster operator authentication Progressing is True with OAuthVersionRoute_WaitingForRoute: OAuthVersionRouteProgressing: Request to "https://oauth-openshift.apps-crc.testing/healthz" not successful yet INFO Cluster operator authentication Available is False with OAuthRouteCheckEndpointAccessibleController_EndpointUnavailable::OAuthVersionRoute_RequestFailed: OAuthVersionRouteAvailable: HTTP request to "https://oauth-openshift.apps-crc.testing/healthz" failed: dial tcp: i/o timeout INFO OAuthRouteCheckEndpointAccessibleControllerAvailable: Get "https://oauth-openshift.apps-crc.testing/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ERROR Cluster operator console Degraded is True with RouteHealth_FailedLoadCA: RouteHealthDegraded: failed to read CA to check route health: configmaps "trusted-ca-bundle" not found INFO Cluster operator console Available is Unknown with NoData: ERROR Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: ingresscontroller "default" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: CanaryChecksSucceeding=False (CanaryChecksRepetitiveFailures: Canary route checks for the default ingress controller are failing) INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator insights UploadDegraded is True with UploadFailed: Unable to report: unable to build request to connect to Insights server: Post "https://cloud.redhat.com/api/ingress/v1/upload": dial tcp: i/o timeout INFO Cluster operator network ManagementStateDegraded is False with : ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, baremetal, console, image-registry, openshift-samples ``` Expected results: Should able to provision cluster successfully. Additional info: This looks like similar to https://bugzilla.redhat.com/show_bug.cgi?id=1908389 where canary status was failing but it was because of load balancer and for libvirt there is no load balancer available. Attached must-gather logs for more info.
Debugging this issue more we found out that it was happening because there is no dns operator scheduled on the cluster. It happened because we were using `single-node-developer` profile and https://github.com/openshift/cluster-dns-operator/pull/216, https://github.com/operator-framework/operator-marketplace/pull/369 still in pending state. Closing this since this is not bug from routing.