Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1896321

Summary:	MachineSet scaling from 0 is not available or evaluated incorrectly for the new or changed instance types
Product:	OpenShift Container Platform	Reporter:	Danil Grigorev <dgrigore>
Component:	Cloud Compute	Assignee:	Joel Speed <jspeed>
Cloud Compute sub component:	Other Providers	QA Contact:	sunzhaohua <zhsun>
Status:	CLOSED ERRATA	Docs Contact:
Severity:	low
Priority:	low	CC:	mimccune
Version:	4.7
Target Milestone:	---
Target Release:	4.8.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Cause: A generated list of AWS instance types was out of date Consequence: Not all instance types are enabled for scaling from zero when using the Cluster Autoscaler and MachineSets with zero replicas Fix: Update the list to include newer instance types Result: More instance types are available to the Cluster Autoscaler for scaling from zero replicas	Story Points:	---
Clone Of:
Clones:	1917838 (view as bug list)		Environment:
Last Closed:	2021-07-27 22:34:10 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1917838, 1942966

Description Danil Grigorev 2020-11-10 09:41:09 UTC

Description of problem:

MachineSet uses a set of annotations to provide source of truth for autoscaling from 0. 

https://github.com/openshift/cluster-api-provider-azure/blob/master/pkg/cloud/azure/actuators/machineset/controller.go#L39-L41

The data for the annotations is gathered from a static list, which becomes outdated over time, providing incorrect estimation of the values or returing nothing for non-listed instance types.

Referenced PR is regenerating these lists, taking the code from upstream autoscaler, and shows the differences in the updated "pkg/actuators/machineset/ec2_instance_types.go" file - https://github.com/openshift/cluster-api-provider-aws/pull/367/files  

Version-Release number of selected component (if applicable):
4.7

How reproducible:

Sometimes

Steps to Reproduce:
1. Create a MachineSet for AWS using p4d.24xlarge instance type
2. Check the annotations on the resource
3. See none and error messages in logs

Actual results:

Scaling from 0 is not available for new instance type, like p4d.24xlarge (AWS)

Expected results:

MachineSet annotation logic should return correct values for any available instance.

Additional info:

Comment 1 Joel Speed 2020-11-13 12:05:00 UTC

This is low priority right now as it works for most instance types. We may be able to add a quick fix (including the new types) during the next sprint and look into a proper long term fix at a later date

Comment 2 Michael McCune 2020-12-04 18:53:02 UTC

adding UpcomingSprint tag, the team should have good bandwidth to address this after feature freeze.

Comment 3 Joel Speed 2021-01-05 17:21:31 UTC

We haven't worked out whether we are going to quick fix this or be able to implement a permanent solution (this will depend on if there is an api for instance types), setting this to target --- so that we triage for future releases

Comment 4 Joel Speed 2021-02-08 09:51:55 UTC

Since we need to implement a permanent solution to this for all providers, I will convert this work to a Jira card and ensure we create a quick fix in the mean time to update the list of instances

Comment 5 Joel Speed 2021-03-25 12:01:10 UTC

Ive created a JIRA card for tracking the dynamic fetching idea longer term, going to use this BZ for the temporary AWS list update for now

If you want to know the progress of a permanent solution, please see https://issues.redhat.com/browse/OCPCLOUD-1131

Comment 6 sunzhaohua 2021-03-30 07:16:20 UTC

verified
clusterversion: 4.8.0-0.nightly-2021-03-26-054333

autoscaler could scale up from 0 with instanceType: p4d.24xlarge
$ oc get machineautoscaler
NAME                  REF KIND     REF NAME                                MIN   MAX   AGE
machineautoscaler-b   MachineSet   xzha0330-4-8-fh26v-worker-us-east-2cc   0     2     35m

$ oc get machine
NAME                                          PHASE     TYPE           REGION      ZONE         AGE
xzha0330-4-8-fh26v-master-0                   Running   m5.xlarge      us-east-2   us-east-2a   6h8m
xzha0330-4-8-fh26v-master-1                   Running   m5.xlarge      us-east-2   us-east-2b   6h8m
xzha0330-4-8-fh26v-master-2                   Running   m5.xlarge      us-east-2   us-east-2c   6h8m
xzha0330-4-8-fh26v-worker-us-east-2a-7nzrq    Running   m5.large       us-east-2   us-east-2a   6h2m
xzha0330-4-8-fh26v-worker-us-east-2b-mfz58    Running   m5.large       us-east-2   us-east-2b   6h2m
xzha0330-4-8-fh26v-worker-us-east-2c-92bb9    Running   m5.large       us-east-2   us-east-2c   6h2m
xzha0330-4-8-fh26v-worker-us-east-2cc-bhgbd   Running   p4d.24xlarge   us-east-2   us-east-2b   33m

Comment 9 errata-xmlrpc 2021-07-27 22:34:10 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:2438