Bug 1918307 - MachineSet scaling from 0 is not available or evaluated incorrectly for the new or changed instance types
Summary: MachineSet scaling from 0 is not available or evaluated incorrectly for the n...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cloud Compute
Version: 4.7
Hardware: Unspecified
OS: Unspecified
low
low
Target Milestone: ---
: 4.6.z
Assignee: Joel Speed
QA Contact: sunzhaohua
URL:
Whiteboard:
Depends On: 1917838
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-01-20 12:19 UTC by OpenShift BugZilla Robot
Modified: 2021-03-30 17:03 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-03-30 17:03:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-api-provider-azure pull 193 0 None open [release-4.6] Bug 1918307: Updating Azure VMSize list from autoscaler. 2021-02-18 01:35:20 UTC
Red Hat Product Errata RHBA-2021:0952 0 None None None 2021-03-30 17:03:25 UTC

Description OpenShift BugZilla Robot 2021-01-20 12:19:16 UTC
+++ This bug was initially created as a clone of Bug #1917838 +++

Using this to update the generated list from Azure specifically, a longer term solution is still required

+++ This bug was initially created as a clone of Bug #1896321 +++

Description of problem:

MachineSet uses a set of annotations to provide source of truth for autoscaling from 0. 

https://github.com/openshift/cluster-api-provider-azure/blob/master/pkg/cloud/azure/actuators/machineset/controller.go#L39-L41

The data for the annotations is gathered from a static list, which becomes outdated over time, providing incorrect estimation of the values or returing nothing for non-listed instance types.

Referenced PR is regenerating these lists, taking the code from upstream autoscaler, and shows the differences in the updated "pkg/actuators/machineset/ec2_instance_types.go" file - https://github.com/openshift/cluster-api-provider-aws/pull/367/files  

Version-Release number of selected component (if applicable):
4.7

How reproducible:

Sometimes

Steps to Reproduce:
1. Create a MachineSet for AWS using p4d.24xlarge instance type
2. Check the annotations on the resource
3. See none and error messages in logs

Actual results:

Scaling from 0 is not available for new instance type, like p4d.24xlarge (AWS)

Expected results:

MachineSet annotation logic should return correct values for any available instance.

Additional info:

--- Additional comment from Joel Speed on 2020-11-13 12:05:00 UTC ---

This is low priority right now as it works for most instance types. We may be able to add a quick fix (including the new types) during the next sprint and look into a proper long term fix at a later date

--- Additional comment from Michael McCune on 2020-12-04 18:53:02 UTC ---

adding UpcomingSprint tag, the team should have good bandwidth to address this after feature freeze.

--- Additional comment from Joel Speed on 2021-01-05 17:21:31 UTC ---

We haven't worked out whether we are going to quick fix this or be able to implement a permanent solution (this will depend on if there is an api for instance types), setting this to target --- so that we triage for future releases

Comment 5 sunzhaohua 2021-03-23 06:23:23 UTC
verified
clusterversion: 4.6.0-0.nightly-2021-03-21-131139
scale up machineset with instance type Standard_D2as_v4, replicas=0, machine could scale up successful.

$ oc get machine
NAME                                         PHASE     TYPE               REGION           ZONE   AGE
zhsun46-jjddw-master-0                       Running   Standard_D8s_v3    northcentralus          5h15m
zhsun46-jjddw-master-1                       Running   Standard_D8s_v3    northcentralus          5h15m
zhsun46-jjddw-master-2                       Running   Standard_D8s_v3    northcentralus          5h15m
zhsun46-jjddw-worker-northcentralus-549v8    Running   Standard_D2s_v3    northcentralus          5h10m
zhsun46-jjddw-worker-northcentralus-69s55    Running   Standard_D2s_v3    northcentralus          5h10m
zhsun46-jjddw-worker-northcentralus-6dqm8    Running   Standard_D2s_v3    northcentralus          5h10m
zhsun46-jjddw-worker-northcentralus1-lcb4z   Running   Standard_D2as_v4   northcentralus          110m
zhsun46-jjddw-worker-northcentralus1-mbjpq   Running   Standard_D2as_v4   northcentralus          116m
zhsun46-jjddw-worker-northcentralus1-wwnd7   Running   Standard_D2as_v4   northcentralus          116m

Comment 7 errata-xmlrpc 2021-03-30 17:03:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.23 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0952


Note You need to log in before you can comment on or make changes to this bug.