1896327 – Installation failed on Azure

Bug 1896327 - Installation failed on Azure

Summary: Installation failed on Azure

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	aos-install
QA Contact:	Gaoyun Pei
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1857446
TreeView+	depends on / blocked

Reported:	2020-11-10 09:55 UTC by Harshal Patil
Modified:	2020-11-17 10:15 UTC (History)
CC List:	1 user (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-11-12 21:01:46 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
install logs (565.14 KB, text/plain) 2020-11-10 09:55 UTC, Harshal Patil	no flags	Details
View All

Description Harshal Patil 2020-11-10 09:55:11 UTC

Created attachment 1728013 [details]
install logs

Version:
ocp 4.6.3

$ openshift-install version
$ ./openshift-install version 
./openshift-install 4.6.3
built from commit a4f0869e0d2a5b2d645f0f28ef9e4b100fa8f779
release image quay.io/openshift-release-dev/ocp-release@sha256:14986d2b9c112ca955aaa03f7157beadda0bd3c089e5e1d56f28020d2dd55c52


Platform:
Azure

#Please specify the platform type: aws, libvirt, openstack or baremetal etc.
IPI


What happened?

$ ./openshift-install create cluster --dir larger-cluster 
INFO Credentials loaded from file "/home/harshal/.azure/osServicePrincipal.json" 
INFO Consuming Install Config from target directory 
INFO Creating infrastructure resources...         
INFO Waiting up to 20m0s for the Kubernetes API at https://api.arobug5.catchall.azure.devcluster.openshift.com:6443... 
INFO API v1.19.0+9f84db3 up                       
INFO Waiting up to 30m0s for bootstrapping to complete... 
INFO Destroying the bootstrap resources...        
INFO Waiting up to 40m0s for the cluster at https://api.arobug5.catchall.azure.devcluster.openshift.com:6443 to initialize... 
E1110 13:31:09.776189  905808 reflector.go:307] k8s.io/client-go/tools/watch/informerwatcher.go:146: Failed to watch *v1.ClusterVersion: Get "https://api.arobug5.catchall.azure.devcluster.openshift.com:6443/apis/config.openshift.io/v1/clusterversions?allowWatchBookmarks=true&fieldSelector=metadata.name%3Dversion&resourceVersion=21606&timeoutSeconds=347&watch=true": read tcp 192.168.1.12:36956->40.76.163.121:6443: read: connection timed out
ERROR Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthRouteCheckEndpointAccessibleController_SyncError::OAuthServerDeployment_DeploymentAvailableReplicasCheckFailed::OAuthServerRoute_InvalidCanonicalHost::OAuthServiceCheckEndpointAccessibleController_SyncError::OAuthServiceEndpointsCheckEndpointAccessibleController_SyncError::OAuthVersionDeployment_GetFailed::Route_InvalidCanonicalHost::WellKnownReadyController_SyncError: OAuthServiceEndpointsCheckEndpointAccessibleControllerDegraded: oauth service endpoints are not ready
OAuthServiceCheckEndpointAccessibleControllerDegraded: Get "https://172.30.163.22:443/healthz": dial tcp 172.30.163.22:443: connect: connection refused
IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server
OAuthRouteCheckEndpointAccessibleControllerDegraded: route status does not have host address
RouteDegraded: Route is not available at canonical host oauth-openshift.apps.arobug5.catchall.azure.devcluster.openshift.com: route status ingress is empty
WellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this)
OAuthVersionDeploymentDegraded: Unable to get OAuth server deployment: deployment.apps "oauth-openshift" not found
OAuthServerDeploymentDegraded: deployments.apps "oauth-openshift" not found
OAuthServerRouteDegraded: Route is not available at canonical host oauth-openshift.apps.arobug5.catchall.azure.devcluster.openshift.com: route status ingress is empty 
INFO Cluster operator authentication Available is False with OAuthServiceCheckEndpointAccessibleController_EndpointUnavailable::OAuthServiceEndpointsCheckEndpointAccessibleController_EndpointUnavailable::OAuthVersionDeployment_MissingDeployment::ReadyIngressNodes_NoReadyIngressNodes::WellKnown_NotReady: ReadyIngressNodesAvailable: Authentication require functional ingress which requires at least one schedulable and ready node. Got 0 worker nodes and 3 master nodes (none are schedulable or ready for ingress pods).
OAuthServiceEndpointsCheckEndpointAccessibleControllerAvailable: Failed to get oauth-openshift enpoints
OAuthServiceCheckEndpointAccessibleControllerAvailable: Get "https://172.30.163.22:443/healthz": dial tcp 172.30.163.22:443: connect: connection refused
WellKnownAvailable: The well-known endpoint is not yet available: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap "oauth-openshift" not found (check authentication operator, it is supposed to create this) 
INFO Cluster operator console Progressing is True with DefaultRouteSync_FailedAdmitDefaultRoute::OAuthClientSync_FailedHost: DefaultRouteSyncProgressing: route "console" is not available at canonical host []
OAuthClientSyncProgressing: route "console" is not available at canonical host [] 
INFO Cluster operator console Available is Unknown with NoData:  
INFO Cluster operator image-registry Available is False with NoReplicasAvailable: Available: The deployment does not have available replicas
ImagePrunerAvailable: Pruner CronJob has been created 
INFO Cluster operator image-registry Progressing is True with DeploymentNotCompleted: Progressing: The deployment has not completed 
INFO Cluster operator ingress Available is False with IngressUnavailable: Not all ingress controllers are available. 
INFO Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available. 
ERROR Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: default 
INFO Cluster operator insights Disabled is False with AsExpected:  
INFO Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available 
INFO Cluster operator monitoring Available is False with :  
INFO Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack. 
ERROR Cluster operator monitoring Degraded is True with UpdatingAlertmanagerFailed: Failed to rollout the stack. Error: running task Updating Alertmanager failed: waiting for Alertmanager Route to become ready failed: waiting for route openshift-monitoring/alertmanager-main: no status available 
FATAL failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring 

#Enter text here.

#See the troubleshooting documentation (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md) for ideas about what information to collect.

#For example, 

# If the installer fails to create resources (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#installer-fails-to-create-resources), attach the relevant portions of your `.openshift_install.log.`
# If the installer fails to bootstrap the cluster (https://github.com/openshift/installer/blob/master/docs/user/troubleshootingbootstrap.md), attach the bootstrap log bundle.
# If the installer fails to complete installation after bootstrapping completes (https://github.com/openshift/installer/blob/master/docs/user/troubleshooting.md#installer-fails-to-initialize-the-cluster), attach the must-gather log bundle using `oc adm must-gather`

# Always at least include the `.openshift_install.log`

What did you expect to happen?

#Enter text here.

How to reproduce it (as minimally and precisely as possible)?

$ your-commands-here

Anything else we need to know?

#Enter text here.

Comment 2 Matthew Staebler 2020-11-10 17:54:42 UTC

It looks like the worker machines are not getting added to the cluster. Please include the must-gather logs from the cluster by running `oc adm must-gather`.

Comment 16 Matthew Staebler 2020-11-12 20:47:55 UTC

This is the error on the worker machines when I use the Standard_D4_v3 instance type with no other changes to your install config.

$ oc get machines -n openshift-machine-api mstaeble-hqvls-worker-centralus1-xqsnn -ojson | jq -r .status.errorMessage
failed to reconcile machine "mstaeble-hqvls-worker-centralus1-xqsnn": failed to create vm mstaeble-hqvls-worker-centralus1-xqsnn: failure sending request for machine mstaeble-hqvls-worker-centralus1-xqsnn: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="Requested operation cannot be performed because the VM size Standard_D4_v3 does not support the storage account type Premium_LRS of disk 'mstaeble-hqvls-worker-centralus1-xqsnn_OSDisk'. Consider updating the VM to a size that supports Premium storage." Target="osDisk.managedDisk.storageAccountType"

Comment 17 Matthew Staebler 2020-11-12 21:01:46 UTC

If you want to use the default Premium_LRS disk type, then you should use an instance type of Standard_D4s_v3.
If you want to use Standard_D4_v3, then you should set the disk type to Standard_LRS or StandardSSD_LRS.

See https://docs.microsoft.com/en-us/azure/virtual-machines/dv2-dsv2-series-memory

Note You need to log in before you can comment on or make changes to this bug.