Description of problem: MachineSet uses a set of annotations to provide source of truth for autoscaling from 0. https://github.com/openshift/cluster-api-provider-azure/blob/master/pkg/cloud/azure/actuators/machineset/controller.go#L39-L41 The data for the annotations is gathered from a static list, which becomes outdated over time, providing incorrect estimation of the values or returing nothing for non-listed instance types. Referenced PR is regenerating these lists, taking the code from upstream autoscaler, and shows the differences in the updated "pkg/actuators/machineset/ec2_instance_types.go" file - https://github.com/openshift/cluster-api-provider-aws/pull/367/files Version-Release number of selected component (if applicable): 4.7 How reproducible: Sometimes Steps to Reproduce: 1. Create a MachineSet for AWS using p4d.24xlarge instance type 2. Check the annotations on the resource 3. See none and error messages in logs Actual results: Scaling from 0 is not available for new instance type, like p4d.24xlarge (AWS) Expected results: MachineSet annotation logic should return correct values for any available instance. Additional info:
This is low priority right now as it works for most instance types. We may be able to add a quick fix (including the new types) during the next sprint and look into a proper long term fix at a later date
adding UpcomingSprint tag, the team should have good bandwidth to address this after feature freeze.
We haven't worked out whether we are going to quick fix this or be able to implement a permanent solution (this will depend on if there is an api for instance types), setting this to target --- so that we triage for future releases
Since we need to implement a permanent solution to this for all providers, I will convert this work to a Jira card and ensure we create a quick fix in the mean time to update the list of instances
Ive created a JIRA card for tracking the dynamic fetching idea longer term, going to use this BZ for the temporary AWS list update for now If you want to know the progress of a permanent solution, please see https://issues.redhat.com/browse/OCPCLOUD-1131
verified clusterversion: 4.8.0-0.nightly-2021-03-26-054333 autoscaler could scale up from 0 with instanceType: p4d.24xlarge $ oc get machineautoscaler NAME REF KIND REF NAME MIN MAX AGE machineautoscaler-b MachineSet xzha0330-4-8-fh26v-worker-us-east-2cc 0 2 35m $ oc get machine NAME PHASE TYPE REGION ZONE AGE xzha0330-4-8-fh26v-master-0 Running m5.xlarge us-east-2 us-east-2a 6h8m xzha0330-4-8-fh26v-master-1 Running m5.xlarge us-east-2 us-east-2b 6h8m xzha0330-4-8-fh26v-master-2 Running m5.xlarge us-east-2 us-east-2c 6h8m xzha0330-4-8-fh26v-worker-us-east-2a-7nzrq Running m5.large us-east-2 us-east-2a 6h2m xzha0330-4-8-fh26v-worker-us-east-2b-mfz58 Running m5.large us-east-2 us-east-2b 6h2m xzha0330-4-8-fh26v-worker-us-east-2c-92bb9 Running m5.large us-east-2 us-east-2c 6h2m xzha0330-4-8-fh26v-worker-us-east-2cc-bhgbd Running p4d.24xlarge us-east-2 us-east-2b 33m
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438