Bug 2025868

Summary: [IPI on azure] Pre-check on IPI Azure, should check whether the VM Size’s HyperVGenerations contains ‘V1’ for the sku.
Product: OpenShift Container Platform Reporter: MayXu <maxu>
Component: InstallerAssignee: Aditya Narayanaswamy <anarayan>
Installer sub component: openshift-installer QA Contact: MayXu <maxu>
Status: CLOSED ERRATA Docs Contact:
Severity: low    
Priority: medium CC: anarayan, gpei, jialiu, maxu, mstaeble, padillon
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-21 12:40:05 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description MayXu 2021-11-23 09:10:29 UTC
Version:

$ openshift-install version
./openshift-install 4.9.0-0.nightly-2021-11-18-000209
built from commit 1c538b8949f3a0e5b993e1ae33b9cd799806fa93
release image registry.ci.openshift.org/ocp/release@sha256:c2c8cd51afb5d02717881b2af4e8965f03a893c2f04511a3544b8477e3484e16
release architecture amd64


Platform:

Azure

Please specify:
* IPI 


What happened?

Install Azure cluster with a special VM size which HyperVGenerations is ‘V2’, does not contain ‘V1’, installer Pre-check gets passed, but cluster install completed with ERROR message.

Specified the vm size as master :
ERROR Error: creating Linux Virtual Machine "maxusizedd-t5bww-master-0" (Resource Group "maxusizedd-t5bww-rg"): compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=0 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_DC4s_v3' cannot boot Hypervisor Generation '1'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '1' VM Size." 
ERROR                                              
ERROR   on ../../../tmp/openshift-install-cluster-736542714/master/master.tf line 84, in resource "azurerm_linux_virtual_machine" "master": 
ERROR   84: resource "azurerm_linux_virtual_machine" "master" { 
ERROR 

Specified the vm size as worker :
$ oc get nodes 
 no worker be listed
$oc get event -n openshift-machine-api 
…
57m         Warning   FailedCreate        machine/may-sg-rflz7-worker-centralus2-5b75c        InvalidConfiguration: failed to reconcile machine "may-sg-rflz7-worker-centralus2-5b75c": failed to create vm may-sg-rflz7-worker-centralus2-5b75c: failure sending request for machine may-sg-rflz7-worker-centralus2-5b75c: cannot create vm: compute.VirtualMachinesClient#CreateOrUpdate: Failure sending request: StatusCode=400 -- Original Error: Code="BadRequest" Message="The selected VM size 'Standard_DC2ds_v3' cannot boot Hypervisor Generation '1'. If this was a Create operation please check that the Hypervisor Generation of the Image matches the Hypervisor Generation of the selected VM Size. If this was an Update operation please select a Hypervisor Generation '1' VM Size."



What did you expect to happen?

Pre-check of the installer should prompt the vm size is invalid immediately,
 eg:The selected VM size 'Standard_DC4s_v3' cannot boot Hypervisor Generation '1'.


How to reproduce it (as minimally and precisely as possible)?

1.Create install-config.yaml 
$openshift-install create install-config --dir <installFolder>

2. Customize the vm size in the created install-config.yaml, use the vm size which HyperVGenerations value is just 'V2' such as Standard_DC2s_v3, Standard_DC4s_v3.
name: master
  platform:
    azure:
        type: Standard_DC4s_v3
or 
name: worker
  platform:
    azure:
        type: Standard_DC2s_v3

3. $openshift-install create cluster --dir <instalFolder>


Anything else we need to know?

ref:https://bugzilla.redhat.com/show_bug.cgi?id=1954707 Azure VHD fails to install on gen2 NDv2 instances

Comment 2 Patrick Dillon 2021-12-17 21:27:20 UTC
Trying to establish some background on this. My take is: Azure supports gen1 & gen2 VMs. Typically you create a gen2 VM by selecting a gen2 compatible instance type (for example a Standard D4s v3 is both gen1 & gen2 compatible) AND a gen2 image. The gen2 image is what tells the instance to be gen2. In particular, it seems to be metadata on the image. 

For our use case where we create images from VHDs, this is addressed in the FAQS here: https://docs.microsoft.com/en-us/azure/virtual-machines/generation-2#frequently-asked-questions

It looks like a managed disk is required.

Comment 3 MayXu 2021-12-19 14:18:54 UTC
$ az vm list-skus -l centralus --size Standard_DC4s_v3 --query "[].{HyperVGenerations:capabilities[?name=='HyperVGenerations'].value}"
[
  {
    "HyperVGenerations": [
      "V2"
    ]
  }
]

 if the vm size's HyperVGenerations value is not included "V1", and now we have not support, how about prompt user early ?

Comment 6 MayXu 2022-01-29 08:34:55 UTC
when select the Gen2 market image RedHat:ocp-worker:ocp-worker-a:4.8.2021122100, with the Standard_DC4s_v3 still prompt : 
level=fatal msg=failed to fetch Master Machines: failed to load asset "Install Config": compute[0].platform.azure.type: Invalid value: "Standard_DC4s_v3": only disks with HyperVGeneration V1 are supported

expected result: install succeed without error with Standard_DC4s_v3 based on the gen2 market image.

Comment 8 Patrick Dillon 2022-03-11 01:30:46 UTC
For 4.10, we expect V2-only instance types to be rejected when entered in the install config. Marketplace images are only supported through editing the manifests. I have updated the KCS article to reflect this: "If you choose to use an instance type which is only Gen2-compatible, the instance type must be specified when editing the manifests--it cannot be specified in the install config."

I am setting this back to ON_QA. Please let me know if there are more questions. Note, we are hoping to add Gen2 support in 4.11, which would make all of this more straightforward.

Comment 9 MayXu 2022-03-11 03:01:31 UTC
(In reply to Patrick Dillon from comment #8)
> For 4.10, we expect V2-only instance types to be rejected when entered in
> the install config. Marketplace images are only supported through editing
> the manifests. I have updated the KCS article to reflect this: "If you
> choose to use an instance type which is only Gen2-compatible, the instance
> type must be specified when editing the manifests--it cannot be specified in
> the install config."
> 
> I am setting this back to ON_QA. Please let me know if there are more
> questions. Note, we are hoping to add Gen2 support in 4.11, which would make
> all of this more straightforward.

Thanks

Comment 12 errata-xmlrpc 2022-03-21 12:40:05 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.5 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0928