Bug 1713157

Summary: Cannot scale default machineset us-west-2d provisioned by installer
Product: OpenShift Container Platform Reporter: Michael Gugino <mgugino>
Component: InstallerAssignee: W. Trevor King <wking>
Installer sub component: openshift-installer QA Contact: sheng.lao <shlao>
Status: CLOSED ERRATA Docs Contact:
Severity: unspecified    
Priority: high CC: bleanhar, sponnaga, wking
Version: 4.1.0   
Target Milestone: ---   
Target Release: 4.1.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: 4.1.3
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-26 08:50:22 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Michael Gugino 2019-05-23 02:41:49 UTC
Description:
Cannot scale default machineset us-west-2d provisioned by installer.  It appears m4.large is not a valid type for us-west-2d.

If we're going to create a machineset by default, we should make sure it's a valid one.

Version:
4.1.rc5

Output from machine-controller:

Failed to create machine "...": error launching instance: error creating EC2 instance: Unsupported: Your requested instance type (m4.large) is not supported in your requested Availability Zone (us-west-2d).

Comment 1 W. Trevor King 2019-05-23 03:17:23 UTC
> Your requested instance type (m4.large) is not supported in your requested Availability Zone (us-west-2d).

AWS docs for this error [1].  This is usually a temporary shortage of the target type in the target zone, in which case it comes out as:

  level=error msg="\t* aws_instance.master.0: Error launching source instance: Unsupported: Your requested instance type (m5.xlarge) is not supported in your requested Availability Zone (ap-southeast-2a). Please retry your request by not specifying an Availability Zone or choosing ap-southeast-2b, ap-southeast-2c."

The solution to that is for us to grow machine-API support for "I don't care what zone these land in [please choose from any of $ZONES]", and then for us to somehow mimic that in Terraform.

In this case, it appears that there is one of the four us-west-2 zones that is legitimately lacking in m4.large:

  $ AWS_PROFILE=ci aws --region us-west-2 ec2 describe-reserved-instances-offerings --filters 'Name=scope,Values=Availability Zone' --no-include-marketplace --instance-type m4.large | jq -r '.ReservedInstancesOfferings[].AvailabilityZone' | sort | uniq
  us-west-2a
  us-west-2b
  us-west-2c

We can add support for per-zone defaults, but I'm going to guess that would be a 4.2 thing.  We can also switch the us-west-2 defaults to m5, because:

  $ AWS_PROFILE=ci aws --region us-west-2 ec2 describe-reserved-instances-offerings --filters 'Name=scope,Values=Availability Zone' --no-include-marketplace --instance-type m5.large | jq -r '.ReservedInstancesOfferings[].AvailabilityZone' | sort | uniq
  us-west-2a
  us-west-2b
  us-west-2c
  us-west-2d
  $ AWS_PROFILE=ci aws --region us-west-2 ec2 describe-reserved-instances-offerings --filters 'Name=scope,Values=Availability Zone' --no-include-marketplace --instance-type m5.xlarge | jq -r '.ReservedInstancesOfferings[].AvailabilityZone' | sort | uniq
  us-west-2a
  us-west-2b
  us-west-2c
  us-west-2d

If we go that route, we'd want to check for other regions where this issue exists and switch their defaults as well.

[1]: https://docs.aws.amazon.com/autoscaling/ec2/userguide/ts-as-instancelaunchfailure.html#ts-as-instancelaunchfailure-6

Comment 2 W. Trevor King 2019-05-23 19:24:27 UTC
Master PR for this: https://github.com/openshift/installer/pull/1786  Bot will cherry-pick back to 4.1 when that lands.

Comment 3 W. Trevor King 2019-05-29 13:00:42 UTC
4.1 backport: https://github.com/openshift/installer/pull/1787

Comment 4 W. Trevor King 2019-06-19 11:28:18 UTC
Backport merged.

Comment 9 errata-xmlrpc 2019-06-26 08:50:22 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:1589