Bug 1710981 - m4 instances are old (2015), OpenShift should default to m5 instances for IPI and UPI installs on AWS
Summary: m4 instances are old (2015), OpenShift should default to m5 instances for IPI...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Russell Teague
QA Contact: Yunfei Jiang
URL:
Whiteboard:
Depends On: 1769322
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-05-16 17:15 UTC by Mike Fiedler
Modified: 2023-10-06 18:19 UTC (History)
22 users (show)

Fixed In Version:
Doc Type: Enhancement
Doc Text:
Feature: Switched the default preferred AWS instance class from m4 to m5. Reason: Preference for newer hardware Result: New clusters deployed on AWS will use m5 AWS instance class by default. If m5 is not available, the installer will fall back to m4.
Clone Of:
Environment:
Last Closed: 2020-10-27 15:54:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer issues 2919 0 None closed aws: Use modern instance types by default 2021-02-03 11:20:14 UTC
Github openshift installer pull 3853 0 None closed Bug 1710981: Default AWS instance type to 'm5' 2021-02-03 11:20:14 UTC
Red Hat Knowledge Base (Solution) 3314661 0 None None None 2019-09-05 20:18:42 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:54:36 UTC

Comment 6 W. Trevor King 2019-05-23 19:40:19 UTC
Seems like AWS isn't yet ready for us to default to m5 in us-east-1 or ap-southeast-2:

  $ AWS_PROFILE=ci aws --region us-east-1 ec2 describe-reserved-instances-offerings --instance-tenancy default --instance-type m5.large --product-description 'Linux/UNIX' --filters Name=scope,Values='Availability Zone' | jq -r '[.ReservedInstancesOfferings[].AvailabilityZone] | sort | unique[]'
  us-east-1a
  us-east-1b
  us-east-1c
  us-east-1d
  us-east-1f

^ missing us-east-1e.

  $ AWS_PROFILE=ci aws --region ap-southeast-2 ec2 describe-reserved-instances-offerings --instance-tenancy default --instance-type m5.large --product-description 'Linux/UNIX' --filters Name=scope,Values='Availability Zone' | jq -r '[.ReservedInstancesOfferings[].AvailabilityZone] | sort | unique[]'
  ap-southeast-2b
  ap-southeast-2c

^ missing ap-southeast-2a.  As far as the CI account has access (e.g. not to us-gov-east-1, etc.), the other regions have m5 in all their zones.  More on per-zone availability in bug 1713157.

Comment 7 Eric Paris 2019-07-08 19:36:38 UTC
There were 2 issues with m5 that made us (consciously) choose to stick with m4.
1. They have a lower EBS attach limit per node (m5 is 28, m4 is ~40)
2. Kube originally at least had trouble always getting the name of the device inside the instance correct.


While (1) seems like it can easily overcome by getting more nodes it actually costs more. Since you have to pay for CPU/memory for the infrastructure pieces (kernel, kubelet, crio, sdn, etc) per node.

If (2) was solved we probably can switch the default after the issues in comment #6 are resolved...

Comment 8 Scott Dodson 2019-09-30 17:19:46 UTC
Deferring this until M5 is more widely supported.

Comment 9 Scott Dodson 2020-01-14 20:34:24 UTC
With regard to availability in us-east-1e or ap-southeast-2 nothing seems to have changed there. We should revisit item #2 from comment 7 and the concerns from comment 6.

Comment 10 Colin Walters 2020-01-15 00:20:10 UTC
> missing us-east-1e.

One thing to remember is that in AWS the zones are randomized per-account https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html

I suspect that "your" us-east-1e is one of the older AWS zones, possibly the very first, with old hardware.

Some discussion also in https://www.reddit.com/r/aws/comments/9oy2iy/your_requested_instance_type_m5large_is_not/

I think we could try to carry whatever hacks are necessary to identify that AZ and avoid it?

Comment 11 W. Trevor King 2020-01-15 00:25:37 UTC
> I think we could try to carry whatever hacks are necessary to identify that AZ and avoid it?

Vs. just sticking with an image type that is supported across all zones in a region?  I guess that's not future-proof against "AWS adds a new zone with only new hardware".  But as far as I know, there is no API for "what on-demand instance types are available in $ZONE?", which is why we're leaning on reserved-instance queries above.

Comment 12 W. Trevor King 2020-01-15 00:31:33 UTC
> One thing to remember is that in AWS the zones are randomized per-account...

$ AWS_PROFILE=ci aws --region us-east-1 ec2 describe-availability-zones --zone-names us-east-1e | jq -r '.AvailabilityZones[].ZoneId'
use1-az3

So if we wanted to hard-code choices by zone ID, we could do something like that.  I'm not wildly excited about that, though ;).

Comment 14 Andrew Pitt 2020-04-19 12:56:57 UTC
The Canadian region (ca-central-1) *finally* added a 3rd AZ a few weeks ago (ca-central-1d).  This AZ does *not* have "m4" instances.  Anyone who attempts a stock IPI install in ca-central-1 gets the following error:

ERROR Error: Error launching source instance: Unsupported: Your requested instance type (m4.xlarge) is not supported in your requested Availability Zone (ca-central-1d). Please retry your request by not specifying an Availability Zone or choosing ca-central-1a, ca-central-1b. 

Of course, the work around is to edit install-config.yaml and specify m5 instead, but for anyone wanting to install OpenShift 4 with IPI in the Canada region, the error above is likely to be their first installer experience.

Comment 15 Abhinav Dahiya 2020-05-11 17:10:03 UTC
Still waiting for the depend_on bz to be fixed.

Comment 16 Abhinav Dahiya 2020-06-01 19:25:44 UTC
(In reply to Abhinav Dahiya from comment #15)
> Still waiting for the depend_on bz to be fixed.

Still same.

Comment 20 Yunfei Jiang 2020-07-15 06:29:32 UTC
verified. PASS
build: 4.6.0-0.nightly-2020-07-14-035247

Comment 22 errata-xmlrpc 2020-10-27 15:54:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.