Version: ./openshift-install 4.7.0-0.nightly-2020-11-20-234717 built from commit 68282c185253d4831514b20623b1717535c5e6f2 release image registry.svc.ci.openshift.org/ocp/release@sha256:b8667356942dce0e049d44470ba94f0dc1fa64876b324621cfb13c4fb25b9069 Platform: Azure Please specify: IPI with custom install-config.yaml What happened? Entered an invalid VM type/size in install-config.yaml for workers only. The NIC was created in Azure, and I did not see a graceful error with a proper error message. ~~~ INFO Waiting up to 40m0s for the cluster at https://api.esimardwrk03.qe.azure.devcluster.openshift.com:6443 to initialize... DEBUG Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2020-11-20-234717: 97% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2020-11-20-234717: 98% complete DEBUG Still waiting for the cluster to initialize: Working towards 4.7.0-0.nightly-2020-11-20-234717: 98% complete, waiting on authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring DEBUG Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthRouteCheckEndpointAccessibleController_SyncError::OAuthServerDeployment_DeploymentAvailableReplicasCheckFailed::OAuthServerRoute_InvalidCanonicalHost::OAuthServiceCheckEndpointAccessibleController_SyncError::OAuthServiceEndpointsCheckEndpointAccessibleController_SyncError::OAuthVersionDeployment_GetFailed::Route_InvalidCanonicalHost::WellKnownReadyController_SyncError: OAuthServiceCheckEndpointAccessibleControllerDegraded: Get "https://172.30.54.209:443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) ERROR OAuthServiceEndpointsCheckEndpointAccessibleControllerDegraded: oauth service endpoints are not ready ERROR IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server ERROR OAuthRouteCheckEndpointAccessibleControllerDegraded: route status does not have host address ERROR OAuthVersionDeploymentDegraded: Unable to get OAuth server deployment: deployment.apps "oauth-openshift" not found ERROR WellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) ERROR OAuthServerDeploymentDegraded: deployments.apps "oauth-openshift" not found ERROR OAuthServerRouteDegraded: no ingress for host oauth-openshift.apps.esimardwrk03.qe.azure.devcluster.openshift.com in route oauth-openshift in namespace openshift-authentication ERROR RouteDegraded: no ingress for host oauth-openshift.apps.esimardwrk03.qe.azure.devcluster.openshift.com in route oauth-openshift in namespace openshift-authentication INFO Cluster operator authentication Available is False with OAuthServiceCheckEndpointAccessibleController_EndpointUnavailable::OAuthServiceEndpointsCheckEndpointAccessibleController_EndpointUnavailable::OAuthVersionDeployment_MissingDeployment::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: OAuthServiceEndpointsCheckEndpointAccessibleControllerAvailable: Failed to get oauth-openshift enpoints INFO ReadyIngressNodesAvailable: Authentication requires functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes, 3 master nodes, 0 custom target nodes (none are schedulable or ready for ingress pods). INFO OAuthServiceCheckEndpointAccessibleControllerAvailable: Get "https://172.30.54.209:443/healthz": context deadline exceeded (Client.Timeout exceeded while awaiting headers) INFO WellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) INFO Cluster operator baremetal Disabled is True with UnsupportedPlatform: Nothing to do on this Platform INFO Cluster operator console Progressing is True with DefaultRouteSync_FailedAdmitDefaultRoute::OAuthClientSync_FailedHost: DefaultRouteSyncProgressing: route "console" is not available at canonical host [] INFO OAuthClientSyncProgressing: route "console" is not available at canonical host [] INFO Cluster operator console Available is Unknown with NoData: INFO Cluster operator image-registry Available is False with NoReplicasAvailable: Available: The deployment does not have available replicas INFO ImagePrunerAvailable: Pruner CronJob has been created INFO Cluster operator image-registry Progressing is True with DeploymentNotCompleted: Progressing: The deployment has not completed INFO Cluster operator ingress Available is False with IngressUnavailable: Not all ingress controllers are available. INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. ERROR Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: ingresscontroller "default" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod "router-default-86d99b9467-scc4m" cannot be scheduled: 0/3 nodes are available: 3 node(s) didn't match node selector. Pod "router-default-86d99b9467-5bzng" cannot be scheduled: 0/3 nodes are available: 3 node(s) didn't match node selector. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1) INFO Cluster operator insights Disabled is False with AsExpected: INFO Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available ERROR Cluster operator monitoring Degraded is True with UpdatingAlertmanagerFailed: Failed to rollout the stack. Error: running task Updating Alertmanager failed: waiting for Alertmanager Route to become ready failed: waiting for route openshift-monitoring/alertmanager-main: no status available INFO Cluster operator monitoring Available is False with : INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. ERROR Cluster initialization failed because one or more operators are not functioning properly. ERROR The cluster should be accessible for troubleshooting as detailed in the documentation linked below, ERROR https://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html ERROR The 'wait-for install-complete' subcommand can then be used to continue the installation FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring What did you expect to happen? Fail gracefully, similarly to a master provisionning error, without provisionning the NIC. Equivalent error on a control node: ~~~ ERROR Error: Error creating Linux Virtual Machine "esimardmst04-ccpbd-master-0" (Resource Group "esimardmst04-ccpbd-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="The value does_not_exist provided for the VM size is not valid. ~~~ Ideally this would be caught before deploying. How to reproduce it (as minimally and precisely as possible)? generate an install-config.yaml file and configure a custom worker type that does not exist: ~~~ compute: - architecture: amd64 hyperthreading: Enabled name: worker platform: azure: type: does_not_exist replicas: 3 ~~~
This should be handled with https://issues.redhat.com/browse/CORS-1549 https://github.com/openshift/installer/pull/4419
Hello Jeremiah, I tested it with a nightly build and it works. Nit: I noticed that the beginning of the error message refers to `Master Machines` for both the compute and the control nodes. `level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": compute[0].platform.azure.type: Invalid value: "potatoes": not found in region northcentralus` Should the first part be adjusted?
I do not believe so... the string "Master Machines" was pre-existing code outside the scope of this bz.
Verified with: 4.7.0-0.nightly-2020-12-04-013308 ./openshift-install 4.7.0-0.nightly-2020-12-04-013308 built from commit b9701c56ece235c8a988530816aac84980a91bdd release image registry.svc.ci.openshift.org/ocp/release@sha256:2352dfe2655dcc891e3c09b4c260b9e346e930ee4dcdc96c6a7fd003860ef100 ~~~ ... info msg=Credentials loaded from file ... fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": compute[0].platform.azure.type: Invalid value: "potatoes": not found in region northcentralus ~~~
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633