When special the worker vm which HyperVGenerations is V2, the worker node fails to be created. Version: registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-11-054135 How reproducible: always Steps to Reproduce: Specify the compute vm as ‘Standard_DC4s_v3’ (HyperVGenerations is ‘V2’) in install-config.yaml Create the cluster Actual results: Fail to create the worker nodes maxu-hy4-ndjmn-worker-eastus21-k955j Provisioning 3h43m maxu-hy4-ndjmn-worker-eastus23-w8m9v Provisioning 3h43m check the logs as the following: oc logs -n openshift-machine-api machine-api-controllers-6f85d75-ld8sc -c machine-controller I0512 09:54:10.521080 1 actuator.go:85] Creating machine maxu-hy5-9hmpj-worker-eastus21-4t2mf panic: runtime error: invalid memory address or nil pointer dereference [signal SIGSEGV: segmentation violation code=0x1 addr=0x28 pc=0x18b7d4e] goroutine 378 [running]: github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Reconciler).createNetworkInterface(0xc000647580, {0x1fbb2e8, 0xc000042390}, {0xc0008411a0, 0x28}) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/reconciler.go:509 +0x1ee github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Reconciler).CreateMachine(0xc000647580, {0x1fbb2e8, 0xc000042390}) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/reconciler.go:120 +0x105 github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Reconciler).Create(0xc000647580, {0x1fbb2e8, 0xc000042390}) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/reconciler.go:98 +0x45 github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine.(*Actuator).Create(0xc0006c03c0, {0x1, 0x1}, 0xc000b95d40) /go/src/github.com/openshift/machine-api-provider-azure/pkg/cloud/azure/actuators/machine/actuator.go:96 +0x2c5 github.com/openshift/machine-api-operator/pkg/controller/machine.(*ReconcileMachine).Reconcile(0xc000522ff0, {0x1fbb358, 0xc00057e930}, {{{0xc000682eb8, 0x1c31b00}, {0xc000840630, 0x30}}}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/github.com/openshift/machine-api-operator/pkg/controller/machine/controller.go:387 +0xab4 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Reconcile(0xc0001a2160, {0x1fbb358, 0xc00057e810}, {{{0xc000682eb8, 0x1c31b00}, {0xc000840630, 0x413894}}}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:114 +0x26f sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).reconcileHandler(0xc0001a2160, {0x1fbb2b0, 0xc00013a740}, {0x1b29c80, 0xc000316cc0}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:311 +0x33e sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).processNextWorkItem(0xc0001a2160, {0x1fbb2b0, 0xc00013a740}) /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:266 +0x205 sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2.2() /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:227 +0x85 created by sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller).Start.func2 /go/src/github.com/openshift/machine-api-provider-azure/vendor/sigs.k8s.io/controller-runtime/pkg/internal/controller/controller.go:223 +0x357 Expected results: Install success. the worker nodes are created. Additional info: Now the default vmNetworkingType of the worker is "Accelerated", changed to “Basic”, worker nodes can be created. Standard_DC8s_v3 as master vm type, is ok; as worker vm type failed. Ref : https://issues.redhat.com/browse/CORS-1916 https://issues.redhat.com/browse/SPLAT-205
when set the compute and controlPlane azure.type as ‘Standard_NP10s’ (HyperVGenerations is ‘V1’) in install-config.yaml (region: southcentralus) got the same error. test version: release:4.11.0-0.nightly-2022-05-11-054135
@maxu Do you happen to have a must-gather available from one of the times you've produced this issue? It would be helpful to see the full system logs from the cluster and in particular the Machines that were generated by the installer and installed within the cluster
The issue here is that we assume we have a complete enumeration of the instance types in our cached list (which is not true) and are taking a value from something that is potentially nil. We can make a quick fix to pass the Machine creation to Azure and see how their error handling handles it, but the better thing would be to have a dynamic check for whether accelerated networking is supported or not. The offending line is https://github.com/openshift/machine-api-provider-azure/blob/08dab41984186873b843f2edd43931b2f378e38b/pkg/cloud/azure/actuators/machine/reconciler.go#L509
*** Bug 2085443 has been marked as a duplicate of this bug. ***
checked with registry.ci.openshift.org/ocp/release:4.11.0-0.nightly-2022-05-25-080235 worker vm type as Standard_E4ads_v5('V1,V2'), Standard_NP10s ('V1'), Standard_DC4s_v3 ('V2') all PASS
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069