Description of problem: Standard_D2s_v3 as worker failed by “accelerated networking not supported on instance type” https://github.com/openshift/machine-api-provider-azure/blob/main/pkg/cloud/azure/actuators/machineset/azure_instance_types.go, some vm type is set to AcceleratedNetworking: false , while azure support AcceleratedNetworking: true. E.g: standard_D2s_v3, Standard_D2_v3, and Standard_D2a_v4. $ az vm list-skus \ --location southcentralus \ --all true \ --resource-type virtualMachines \ --query "[?capabilities[?name=='AcceleratedNetworkingEnabled'].value!=['False']].{size:size, name:name, vCPUsAvailable:capabilities[?name=='vCPUsAvailable'].value|[0], acceleratedNetworkingEnabled: capabilities[?name=='AcceleratedNetworkingEnabled'].value | [0]}" \ --output table Can get the above vm types. Version-Release number of selected component (if applicable): registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-07-16-020951 How reproducible: Always Steps to Reproduce: 1.specify the vm type of worker as Standard_D2s_v3 in install-config.yaml compute: platform: azure: type: Standard_D2s_v3 2.Cluster install failed. Actual results: $ oc get machines -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api maxu-mi-8prjq-master-0 Running Standard_D4s_v3 southcentralus 1 78m openshift-machine-api maxu-mi-8prjq-master-1 Running Standard_D4s_v3 southcentralus 2 78m openshift-machine-api maxu-mi-8prjq-master-2 Running Standard_D4s_v3 southcentralus 3 78m openshift-machine-api maxu-mi-8prjq-worker-southcentralus1-4fzkx Failed 69m openshift-machine-api maxu-mi-8prjq-worker-southcentralus2-dqmlf Failed 69m openshift-machine-api maxu-mi-8prjq-worker-southcentralus3-jwr5s Failed 69m In .openshift_install.log: level=error msg="Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::OAuthClientsController_SyncError::OAuthServerDeployment_PreconditionNotFulfilled::OAuthServerRouteEndpointAccessibleController_SyncError::OAuthServerServiceEndpointAccessibleController_SyncError::OAuthServerServiceEndpointsEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server\nOAuthClientsControllerDegraded: no ingress for host oauth-openshift.apps.maxu-mi.qe.azure.devcluster.openshift.com in route oauth-openshift in namespace openshift-authentication\nOAuthServerDeploymentDegraded: waiting for the oauth-openshift route to contain an admitted ingress: no admitted ingress for route oauth-openshift in namespace openshift-authentication\nOAuthServerDeploymentDegraded: \nOAuthServerRouteEndpointAccessibleControllerDegraded: route \"openshift-authentication/oauth-openshift\": status does not have a valid host address\nOAuthServerServiceEndpointAccessibleControllerDegraded: Get \"https://172.30.139.230:443/healthz\": dial tcp 172.30.139.230:443: connect: connection refused\nOAuthServerServiceEndpointsEndpointAccessibleControllerDegraded: oauth service endpoints are not ready\nWellKnownReadyControllerDegraded: failed to get oauth metadata from openshift-config-managed/oauth-openshift ConfigMap: configmap \"oauth-openshift\" not found (check authentication operator, it is supposed to create this)" $ oc describe machine maxu-mi-8prjq-worker-southcentralus1-4fzkx -n openshift-machine-api Error Message: failed to reconcile machine "maxu-mi-8prjq-worker-southcentralus1-4fzkx": failed to create nic maxu-mi-8prjq-worker-southcentralus1-4fzkx-nic for machine maxu-mi-8prjq-worker-southcentralus1-4fzkx: accelerated networking not supported on instance type: Standard_D2s_v3 Error Reason: InvalidConfiguration Expected results: Install succeed, or change the prompt error if fail to support use Standard_D2s_v3 Additional info: 1. https://docs.microsoft.com/en-us/azure/virtual-network/accelerated-networking-overview#supported-vm-instances On instances that support hyperthreading, Accelerated Networking is supported on VM instances with 4 or more vCPUs 2. In install-config.yaml, add compute.hyperthreading: Disabled, similar error as above. $ oc describe machine maxu-mi6-n6s7d-worker-southcentralus1-hs5bk -n openshift-machine-api Error Message: failed to reconcile machine "maxu-mi6-n6s7d-worker-southcentralus1-hs5bk": failed to create nic maxu-mi6-n6s7d-worker-southcentralus1-hs5bk-nic for machine maxu-mi6-n6s7d-worker-southcentralus1-hs5bk: accelerated networking not supported on instance type: Standard_D2s_v3 3. Can create vm with Standard_D2s_v3 based on the existed accelerated networking $ az network nic list -g $RG -o json --query "[].[name, enableAcceleratedNetworking]" -o tsv maxu-mi7-m595d-worker-southcentralus3-5g6gh-nic True $az vm create --resource-group $RG --name maxutest1 --ssh-key-values '~/openshift-qe.pub' --admin-username cloud-user --image '/subscriptions/53b8f551-f0fc-4bea-8cba-6d1fefd54c8a/resourceGroups/os4-common/providers/Microsoft.Compute/galleries/openshift_qe_image/images/qe-rhel77-proxy-registry/versions/2022.4.24' --os-disk-size-gb 99 --nsg '' --size 'Standard_D2s_v3' --nics maxu-mi7-m595d-worker-southcentralus3-5g6gh-nic --debug 4.With Standard_D2s_v4 can create the cluster successfully, the nic is accelerated networking.
Going to ask a member of the team to look further into this, we understand the issue and know what it is we need to do. A dynamic check rather than the static list. Will post updates once we've started working on it.
I am working on a fix for this.
*** Bug 2115852 has been marked as a duplicate of this bug. ***
*** Bug 2115851 has been marked as a duplicate of this bug. ***
Any update for this issue?
Still working on this. I have it in working state, just need to clean it up a bit and make sure there is no regression.
Is the workaround for this issue to simply use a different machine type for the worker nodes?
The workaround can be found here: https://coreos.slack.com/archives/C68TNFWA2/p1659964442215109?thread_ts=1659925635.953309&cid=C68TNFWA2 Eg: workaround by forcing Accelerated=False in the install-config for the compute nodes. Or change the node type for compute nodes.
/bugzilla refresh
Oops. Wrong tab, sorry(In reply to Radek Maňák from comment #10) > /bugzilla refresh Sorry, wrong browser tab
verified on registry.ci.openshift.org/ocp/release:4.12.0-0.ci-2022-09-13-225342 and registry.ci.openshift.org/ocp/release:4.12.0-0.nightly-2022-09-13-202959 $ az network nic list -g $RG -o json --query "[].[name, enableAcceleratedNetworking]" -o tsv maxu-ac1-bwz6k-master-0-nic True maxu-ac1-bwz6k-master-1-nic True maxu-ac1-bwz6k-master-2-nic True maxu-ac1-bwz6k-worker-eastus1-468jw-nic True maxu-ac1-bwz6k-worker-eastus2-p2pzw-nic True maxu-ac1-bwz6k-worker-eastus3-r4rhf-nic True $ oc get machine -A NAMESPACE NAME PHASE TYPE REGION ZONE AGE openshift-machine-api maxu-ac1-bwz6k-master-0 Running Standard_D4s_v3 eastus 2 39m openshift-machine-api maxu-ac1-bwz6k-master-1 Running Standard_D4s_v3 eastus 3 38m openshift-machine-api maxu-ac1-bwz6k-master-2 Running Standard_D4s_v3 eastus 1 38m openshift-machine-api maxu-ac1-bwz6k-worker-eastus1-468jw Running Standard_D2s_v3 eastus 1 30m openshift-machine-api maxu-ac1-bwz6k-worker-eastus2-p2pzw Running Standard_D2s_v3 eastus 2 30m openshift-machine-api maxu-ac1-bwz6k-worker-eastus3-r4rhf Running Standard_D2s_v3 eastus 3 30m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.12.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:7399