Created attachment 1758447 [details] install log Description of problem: Cluster install fails when worker has type Standard_D4_v2 in Azure Version-Release number of selected component (if applicable): 4.7 rc.3 How reproducible: Install a cluster with these in install-config: name: worker platform: azure: type: Standard_D4_v2 region: centralus Actual results: time="2021-02-20T12:16:52-05:00" level=error msg="Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: ingresscontroller \"default\" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod \"router-default-6c9ffb5cd4-m8jjx\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Pod \"router-default-6c9ffb5cd4-lnzmp\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1)" time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator insights Disabled is False with AsExpected: " time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available" time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator monitoring Available is False with : " time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack." time="2021-02-20T12:16:52-05:00" level=error msg="Cluster operator monitoring Degraded is True with UpdatingAlertmanagerFailed: Failed to rollout the stack. Error: running task Updating Alertmanager failed: waiting for Alertmanager Route to become ready failed: waiting for route openshift-monitoring/alertmanager-main: no status available" time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator network ManagementStateDegraded is False with : " time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator network Progressing is True with Deploying: Deployment \"openshift-network-diagnostics/network-check-source\" is not available (awaiting 1 nodes)" time="2021-02-20T12:16:52-05:00" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation" time="2021-02-20T12:16:52-05:00" level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring" Additional info: NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication False False True 104m 104m console 4.7.0-rc.3 Unknown True False 96m 103m image-registry False True True 97m ingress False True True 101m insights 4.7.0-rc.3 True False False 97m monitoring False True True 103m network 4.7.0-rc.3 True True False 104m node-tuning 4.7.0-rc.3 True False False 104m
Created attachment 1758449 [details] must-gather
Standard_D4_v2 not compatible with master
This does not seem to have anything to do with MCO. The MCO does not manage disk types you provide to the cluster. Is Standard_D4_v2 supported? Please make sure its a supported disk type. Passing to installer team to check
From the yaml of a worker Machine, errorMessage: 'failed to reconcile machine "tszeaz022021-cj884-worker-centralus1-k2xtt": failed to create vm tszeaz022021-cj884-worker-centralus1-k2xtt: failure sending request for machine tszeaz022021-cj884-worker-centralus1-k2xtt: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="Requested operation cannot be performed because the VM size Standard_D4_v2 does not support the storage account type Premium_LRS of disk ''tszeaz022021-cj884-worker-centralus1-k2xtt_OSDisk''. Consider updating the VM to a size that supports Premium storage." Target="osDisk.managedDisk.storageAccountType"'
I wonder if the PremiumIO capability determines whether the vm supports Premium_LRS disks. If so, we could add that to the validation done in the installer. $ az vm list-skus --size D4 -l centralus | jq '.[] | {"name":.name,"PremiumIO":.capabilities[]|select(.name=="PremiumIO")|.value}' { "name": "Standard_D4_v2", "PremiumIO": "False" } { "name": "Standard_D4_v3", "PremiumIO": "False" } { "name": "Standard_D4", "PremiumIO": "False" } { "name": "Standard_D48_v3", "PremiumIO": "False" } { "name": "Standard_D4s_v3", "PremiumIO": "True" } { "name": "Standard_D48s_v3", "PremiumIO": "True" } { "name": "Standard_D4d_v4", "PremiumIO": "False" } { "name": "Standard_D48d_v4", "PremiumIO": "False" } { "name": "Standard_D4_v4", "PremiumIO": "False" } { "name": "Standard_D48_v4", "PremiumIO": "False" } { "name": "Standard_D4ds_v4", "PremiumIO": "True" } { "name": "Standard_D48ds_v4", "PremiumIO": "True" } { "name": "Standard_D4s_v4", "PremiumIO": "True" } { "name": "Standard_D48s_v4", "PremiumIO": "True" } { "name": "Standard_D4a_v4", "PremiumIO": "False" } { "name": "Standard_D48a_v4", "PremiumIO": "False" } { "name": "Standard_D4as_v4", "PremiumIO": "True" } { "name": "Standard_D48as_v4", "PremiumIO": "True" }
@mstaeble Thanks for finding the error.
This is the error: FATAL failed to fetch Metadata: failed to load asset "Install Config": [controlPlane.platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D4_v2, compute[0].platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D4_v2] Install-config used: platform: azure: type: Standard_D4_v2 replicas: 3 Thank you for fixing this.
Note: error is detected when generating manifests
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438
Hi Team, While installing OCP 4.9, I encounter same error it seems issue still exists in latest OCP version. Please find details below [test]$ openshift-install version openshift-install 4.9.8 built from commit 1c538b8949f3a0e5b993e1ae33b9cd799806fa93 release image quay.io/openshift-release-dev/ocp-release@sha256:c91c0faf7ae3c480724a935b3dab7e5f49aae19d195b12f3a4ae38f8440ea96b release architecture amd64 name: master platform: azure: osDisk: diskSizeGB: 100 type: Standard_D4_v3 replicas: 1 [test]$ openshift-install create manifests --dir . INFO Credentials loaded from file "/users/s025054/.azure/osServicePrincipal.json" FATAL failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D4_v3, compute[0].platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D2_v3]
Master nodes require PremiumIO capabilities that Standard_D4_v3 does not support. You should use another disktype that actually supports PremiumIO. This bug was created due to the fact that it takes the installer an hour to report this issue and a fix was put in place to check if the disk has PremiumIO capabilities before the cluster creation was started, drastically reducing the time to report the wrong disk configuration. $ az vm list-skus --size D4 -l centralus | jq '.[] | {"name":.name,"PremiumIO":.capabilities[]|select(.name=="PremiumIO")|.value}' { "name": "Standard_D4_v2", "PremiumIO": "False" } { "name": "Standard_D4_v3", "PremiumIO": "False" }