Bug 2025788 - [IPI on azure]Pre-check on IPI Azure, should check VM Size’s vCPUsAvailable instead of vCPUs for the sku.
Summary: [IPI on azure]Pre-check on IPI Azure, should check VM Size’s vCPUsAvailable i...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ---
: 4.10.0
Assignee: Aditya Narayanaswamy
QA Contact: MayXu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-11-23 03:24 UTC by MayXu
Modified: 2022-03-10 16:30 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
The installer was checking if the total number of vcpus available for a given instance type in a region was more than the minimum resource requirement to deploy the cluster but it should have checked for the number of vcpus currently available for that instance type in the region. Changing the check from total number of vcpus to number of vcpus available.
Clone Of:
Environment:
Last Closed: 2022-03-10 16:30:15 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5505 0 None open Bug 2025788: Check vCPUsAvailable for given instance type 2021-12-22 15:46:42 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:30:26 UTC

Description MayXu 2021-11-23 03:24:00 UTC
Version:

$ openshift-install version
4.9.0-0.nightly-2021-11-18-000209

Platform:

Azure

Please specify:
* IPI (automated install with `openshift-install`. If you don't know, then it's IPI)


What happened?

Install Azure cluster with a special VM size which  vCPUsAvailable <  CPU minimum requirement,vCPUs >= CPU minimum requirement,installer pre-check get passed, but cluster install completed with ERROR message.

$oc get po -n openshift-kube-apiserver
installer-8-maxusizeei-pkzw6-master-2         0/1     UnexpectedAdmissionError   0          17m
Check the pods which have issues, as the following:
oc get pods -n openshift-kube-apiserver
`oc get -oyaml -n "openshift-kube-apiserver" pods <erro master pod> ` 
message: 'Pod Unexpected error while attempting to recover from admission failure:
    preemption: error finding a set of pods to preempt: no set of running pods found to reclaim resources: [(res: cpu, q: 105), ]'


What did you expect to happen?

$openshift-install create cluster --dir <installFolder>
Pre-check of the installer should prompt the vm size does not meet the minimum resource requirements of vCPUs immediately.
controlPlane.platform.azure.type: Invalid value: "Standard_E8-2s_v4": instance type does not meet minimum resource requirements of 4 vCPUs,


How to reproduce it (as minimally and precisely as possible)?
1.Create install-config.yaml 
$openshift-install create install-config --dir <installFolder>

2. Customize the vm size type in install-config.yaml, use the vm which vCPUs meet the limit (master is 4, worker is 2), vCPUsAvailable < vCPUs, such as Standard_E8-2s_v4
  name: master
  platform:
    azure:
        type: Standard_E8-2s_v4

3. $openshift-install create cluster --dir <instalFolder>
Has some error message like the following:
ERROR Cluster operator authentication Degraded is True with OAuthServerDeployment_UnavailablePod::OAuthServerRouteEndpointAccessibleController_SyncError::WellKnownReadyController_SyncError: OAuthServerDeploymentDegraded: 1 of 3 requested instances are unavailable for oauth-openshift.openshift-authentication (container is not ready in oauth-openshift-6b8db4f9bc-lcj2t pod) 
ERROR OAuthServerRouteEndpointAccessibleControllerDegraded: Get "https://oauth-openshift.apps.maxusizee4a.qe.azure.devcluster.openshift.com/healthz": dial tcp: lookup oauth-openshift.apps.maxusizee4a.qe.azure.devcluster.openshift.com on 172.30.0.10:53: no such host 



Anything else we need to know?

https://docs.microsoft.com/en-us/azure/virtual-machines/vm-naming-conventions#example-4-m8-2ms_v2-constrained-vcpu

list the vm size which vCPUsAvailable different with vCPUs:
az vm list-skus -l centralus --query "[?resourceType=='virtualMachines'&&capabilities[?name=='vCPUs'].value!=capabilities[?name=='vCPUsAvailable'].value].{Name:name, PremiumIO:capabilities[?name=='PremiumIO'].value, vCPUsAvailable:capabilities[?name=='vCPUsAvailable'].value, vCPUs:capabilities[?name=='vCPUs'].value}"

Not all the size has “vCPUsAvailable”, such as “Standard_B1ls”,”Standard_M416s_v2”,“Standard_M416s_v2”

Comment 3 MayXu 2022-01-05 16:33:59 UTC
with Standard_E8-2s_v4 as control node

FATAL failed to fetch Metadata: failed to load asset "Install Config": controlPlane.platform.azure.type: Invalid value: "Standard_E8-2s_v4": instance type does not meet minimum resource requirements of 4 vCPUsAvailable 

version: 
./openshift-install 4.10.0-0.nightly-2022-01-05-135407
built from commit 22d874c8d0751d5645de95121662e32d17d6eada
release image registry.ci.openshift.org/ocp/release@sha256:592eb8e80ff7d65ee57137b8fb50adc566df066aba532be7779ac009e36f6b59
release architecture amd64

Comment 6 MayXu 2022-02-22 02:20:10 UTC
@anarayan vCPUsAvailable is the property of vm size (instance type) , similar as vCPUs, some of vm sizes have the same values, but some vm size, the vCPUsAvailable is less then the vCPUs, such as standard_E8-4ds_v4, vCPus is 8, vCPUsAvailable is 4. 

Installer will check the vCPUsAvailable instead of vCPUs, whether is match our minimum requirement.

Comment 7 Aditya Narayanaswamy 2022-02-22 13:59:37 UTC
Yeah I understand that. By total number of vcpus, I do mean the field vCPUs and I mean the same with vCPUs available: vCPUsAvailable. I thought from a docs perspective, explaining what they actually mean wuld be more useful.

Comment 9 errata-xmlrpc 2022-03-10 16:30:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.