Bug 1931115
| Summary: | Azure cluster install fails with worker type workers Standard_D4_v2 | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | To Hung Sze <tsze> | ||||||
| Component: | Installer | Assignee: | Aditya Narayanaswamy <anarayan> | ||||||
| Installer sub component: | openshift-installer | QA Contact: | To Hung Sze <tsze> | ||||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||||
| Severity: | low | ||||||||
| Priority: | low | CC: | bleanhar, jerzhang, KurapatiS, mstaeble | ||||||
| Version: | 4.7 | ||||||||
| Target Milestone: | --- | ||||||||
| Target Release: | 4.8.0 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | Doc Type: | Bug Fix | |||||||
| Doc Text: |
Azure clusters created with disk type Premium_LRS and with an instance type that does not support PremiumIO capabilities causes the cluster to fail due to no the mentioned missing Premium functionality
Added a check to see if the instance type picked has the PremiumIO capabilities only if the disk type is Premium_LRS which is the default disk type. The code queries the Azure subscription and region to get the information required and returns error if the condition is met.
|
Story Points: | --- | ||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2021-07-27 22:45:37 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | 4.8.0 | ||||||
| Embargoed: | |||||||||
| Attachments: |
|
||||||||
Created attachment 1758449 [details]
must-gather
Standard_D4_v2 not compatible with master This does not seem to have anything to do with MCO. The MCO does not manage disk types you provide to the cluster. Is Standard_D4_v2 supported? Please make sure its a supported disk type. Passing to installer team to check From the yaml of a worker Machine, errorMessage: 'failed to reconcile machine "tszeaz022021-cj884-worker-centralus1-k2xtt": failed to create vm tszeaz022021-cj884-worker-centralus1-k2xtt: failure sending request for machine tszeaz022021-cj884-worker-centralus1-k2xtt: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="InvalidParameter" Message="Requested operation cannot be performed because the VM size Standard_D4_v2 does not support the storage account type Premium_LRS of disk ''tszeaz022021-cj884-worker-centralus1-k2xtt_OSDisk''. Consider updating the VM to a size that supports Premium storage." Target="osDisk.managedDisk.storageAccountType"' I wonder if the PremiumIO capability determines whether the vm supports Premium_LRS disks. If so, we could add that to the validation done in the installer.
$ az vm list-skus --size D4 -l centralus | jq '.[] | {"name":.name,"PremiumIO":.capabilities[]|select(.name=="PremiumIO")|.value}'
{
"name": "Standard_D4_v2",
"PremiumIO": "False"
}
{
"name": "Standard_D4_v3",
"PremiumIO": "False"
}
{
"name": "Standard_D4",
"PremiumIO": "False"
}
{
"name": "Standard_D48_v3",
"PremiumIO": "False"
}
{
"name": "Standard_D4s_v3",
"PremiumIO": "True"
}
{
"name": "Standard_D48s_v3",
"PremiumIO": "True"
}
{
"name": "Standard_D4d_v4",
"PremiumIO": "False"
}
{
"name": "Standard_D48d_v4",
"PremiumIO": "False"
}
{
"name": "Standard_D4_v4",
"PremiumIO": "False"
}
{
"name": "Standard_D48_v4",
"PremiumIO": "False"
}
{
"name": "Standard_D4ds_v4",
"PremiumIO": "True"
}
{
"name": "Standard_D48ds_v4",
"PremiumIO": "True"
}
{
"name": "Standard_D4s_v4",
"PremiumIO": "True"
}
{
"name": "Standard_D48s_v4",
"PremiumIO": "True"
}
{
"name": "Standard_D4a_v4",
"PremiumIO": "False"
}
{
"name": "Standard_D48a_v4",
"PremiumIO": "False"
}
{
"name": "Standard_D4as_v4",
"PremiumIO": "True"
}
{
"name": "Standard_D48as_v4",
"PremiumIO": "True"
}
@mstaeble Thanks for finding the error. This is the error:
FATAL failed to fetch Metadata: failed to load asset "Install Config": [controlPlane.platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D4_v2, compute[0].platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D4_v2]
Install-config used:
platform:
azure:
type: Standard_D4_v2
replicas: 3
Thank you for fixing this.
Note: error is detected when generating manifests Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 Hi Team,
While installing OCP 4.9, I encounter same error it seems issue still exists in latest OCP version.
Please find details below
[test]$ openshift-install version
openshift-install 4.9.8
built from commit 1c538b8949f3a0e5b993e1ae33b9cd799806fa93
release image quay.io/openshift-release-dev/ocp-release@sha256:c91c0faf7ae3c480724a935b3dab7e5f49aae19d195b12f3a4ae38f8440ea96b
release architecture amd64
name: master
platform:
azure:
osDisk:
diskSizeGB: 100
type: Standard_D4_v3
replicas: 1
[test]$ openshift-install create manifests --dir .
INFO Credentials loaded from file "/users/s025054/.azure/osServicePrincipal.json"
FATAL failed to fetch Master Machines: failed to load asset "Install Config": [controlPlane.platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D4_v3, compute[0].platform.azure.osDisk.diskType: Invalid value: "Premium_LRS": PremiumIO not supported for instance type Standard_D2_v3]
Master nodes require PremiumIO capabilities that Standard_D4_v3 does not support. You should use another disktype that actually supports PremiumIO. This bug was created due to the fact that it takes the installer an hour to report this issue and a fix was put in place to check if the disk has PremiumIO capabilities before the cluster creation was started, drastically reducing the time to report the wrong disk configuration.
$ az vm list-skus --size D4 -l centralus | jq '.[] | {"name":.name,"PremiumIO":.capabilities[]|select(.name=="PremiumIO")|.value}'
{
"name": "Standard_D4_v2",
"PremiumIO": "False"
}
{
"name": "Standard_D4_v3",
"PremiumIO": "False"
}
|
Created attachment 1758447 [details] install log Description of problem: Cluster install fails when worker has type Standard_D4_v2 in Azure Version-Release number of selected component (if applicable): 4.7 rc.3 How reproducible: Install a cluster with these in install-config: name: worker platform: azure: type: Standard_D4_v2 region: centralus Actual results: time="2021-02-20T12:16:52-05:00" level=error msg="Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: ingresscontroller \"default\" is degraded: DegradedConditions: One or more other status conditions indicate a degraded state: PodsScheduled=False (PodsNotScheduled: Some pods are not scheduled: Pod \"router-default-6c9ffb5cd4-m8jjx\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Pod \"router-default-6c9ffb5cd4-lnzmp\" cannot be scheduled: 0/3 nodes are available: 3 node(s) had taint {node-role.kubernetes.io/master: }, that the pod didn't tolerate. Make sure you have sufficient worker nodes.), DeploymentAvailable=False (DeploymentUnavailable: The deployment has Available status condition set to False (reason: MinimumReplicasUnavailable) with message: Deployment does not have minimum availability.), DeploymentReplicasMinAvailable=False (DeploymentMinimumReplicasNotMet: 0/2 of replicas are available, max unavailable is 1)" time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator insights Disabled is False with AsExpected: " time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator kube-storage-version-migrator Available is False with _NoMigratorPod: Available: deployment/migrator.openshift-kube-storage-version-migrator: no replicas are available" time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator monitoring Available is False with : " time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack." time="2021-02-20T12:16:52-05:00" level=error msg="Cluster operator monitoring Degraded is True with UpdatingAlertmanagerFailed: Failed to rollout the stack. Error: running task Updating Alertmanager failed: waiting for Alertmanager Route to become ready failed: waiting for route openshift-monitoring/alertmanager-main: no status available" time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator network ManagementStateDegraded is False with : " time="2021-02-20T12:16:52-05:00" level=info msg="Cluster operator network Progressing is True with Deploying: Deployment \"openshift-network-diagnostics/network-check-source\" is not available (awaiting 1 nodes)" time="2021-02-20T12:16:52-05:00" level=error msg="Cluster initialization failed because one or more operators are not functioning properly.\nThe cluster should be accessible for troubleshooting as detailed in the documentation linked below,\nhttps://docs.openshift.com/container-platform/latest/support/troubleshooting/troubleshooting-installations.html\nThe 'wait-for install-complete' subcommand can then be used to continue the installation" time="2021-02-20T12:16:52-05:00" level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console, image-registry, ingress, kube-storage-version-migrator, monitoring" Additional info: NAME VERSION AVAILABLE PROGRESSING DEGRADED SINCE authentication False False True 104m 104m console 4.7.0-rc.3 Unknown True False 96m 103m image-registry False True True 97m ingress False True True 101m insights 4.7.0-rc.3 True False False 97m monitoring False True True 103m network 4.7.0-rc.3 True True False 104m node-tuning 4.7.0-rc.3 True False False 104m