Seems like AWS isn't yet ready for us to default to m5 in us-east-1 or ap-southeast-2:
$ AWS_PROFILE=ci aws --region us-east-1 ec2 describe-reserved-instances-offerings --instance-tenancy default --instance-type m5.large --product-description 'Linux/UNIX' --filters Name=scope,Values='Availability Zone' | jq -r '[.ReservedInstancesOfferings.AvailabilityZone] | sort | unique'
^ missing us-east-1e.
$ AWS_PROFILE=ci aws --region ap-southeast-2 ec2 describe-reserved-instances-offerings --instance-tenancy default --instance-type m5.large --product-description 'Linux/UNIX' --filters Name=scope,Values='Availability Zone' | jq -r '[.ReservedInstancesOfferings.AvailabilityZone] | sort | unique'
^ missing ap-southeast-2a. As far as the CI account has access (e.g. not to us-gov-east-1, etc.), the other regions have m5 in all their zones. More on per-zone availability in bug 1713157.
There were 2 issues with m5 that made us (consciously) choose to stick with m4.
1. They have a lower EBS attach limit per node (m5 is 28, m4 is ~40)
2. Kube originally at least had trouble always getting the name of the device inside the instance correct.
While (1) seems like it can easily overcome by getting more nodes it actually costs more. Since you have to pay for CPU/memory for the infrastructure pieces (kernel, kubelet, crio, sdn, etc) per node.
If (2) was solved we probably can switch the default after the issues in comment #6 are resolved...
Deferring this until M5 is more widely supported.
With regard to availability in us-east-1e or ap-southeast-2 nothing seems to have changed there. We should revisit item #2 from comment 7 and the concerns from comment 6.
> missing us-east-1e.
One thing to remember is that in AWS the zones are randomized per-account https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html
I suspect that "your" us-east-1e is one of the older AWS zones, possibly the very first, with old hardware.
Some discussion also in https://www.reddit.com/r/aws/comments/9oy2iy/your_requested_instance_type_m5large_is_not/
I think we could try to carry whatever hacks are necessary to identify that AZ and avoid it?
> I think we could try to carry whatever hacks are necessary to identify that AZ and avoid it?
Vs. just sticking with an image type that is supported across all zones in a region? I guess that's not future-proof against "AWS adds a new zone with only new hardware". But as far as I know, there is no API for "what on-demand instance types are available in $ZONE?", which is why we're leaning on reserved-instance queries above.
> One thing to remember is that in AWS the zones are randomized per-account...
$ AWS_PROFILE=ci aws --region us-east-1 ec2 describe-availability-zones --zone-names us-east-1e | jq -r '.AvailabilityZones.ZoneId'
So if we wanted to hard-code choices by zone ID, we could do something like that. I'm not wildly excited about that, though ;).
The Canadian region (ca-central-1) *finally* added a 3rd AZ a few weeks ago (ca-central-1d). This AZ does *not* have "m4" instances. Anyone who attempts a stock IPI install in ca-central-1 gets the following error:
ERROR Error: Error launching source instance: Unsupported: Your requested instance type (m4.xlarge) is not supported in your requested Availability Zone (ca-central-1d). Please retry your request by not specifying an Availability Zone or choosing ca-central-1a, ca-central-1b.
Of course, the work around is to edit install-config.yaml and specify m5 instead, but for anyone wanting to install OpenShift 4 with IPI in the Canada region, the error above is likely to be their first installer experience.
Still waiting for the depend_on bz to be fixed.
(In reply to Abhinav Dahiya from comment #15)
> Still waiting for the depend_on bz to be fixed.