1710981 – m4 instances are old (2015), OpenShift should default to m5 instances for IPI and UPI installs on AWS

Bug 1710981 - m4 instances are old (2015), OpenShift should default to m5 instances for IPI and UPI installs on AWS

Summary: m4 instances are old (2015), OpenShift should default to m5 instances for IPI...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Installer
Sub Component:
Version:	4.1.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Russell Teague
QA Contact:	Yunfei Jiang
Docs Contact:
URL:
Whiteboard:
Depends On:	1769322
Blocks:
TreeView+	depends on / blocked

Reported:	2019-05-16 17:15 UTC by Mike Fiedler
Modified:	2023-10-06 18:19 UTC (History)
CC List:	22 users (show)
Fixed In Version:
Doc Type:	Enhancement
Doc Text:	Feature: Switched the default preferred AWS instance class from m4 to m5. Reason: Preference for newer hardware Result: New clusters deployed on AWS will use m5 AWS instance class by default. If m5 is not available, the installer will fall back to m4.
Clone Of:
Environment:
Last Closed:	2020-10-27 15:54:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift installer issues 2919	None	closed	aws: Use modern instance types by default	2021-02-03 11:20:14 UTC
Github	openshift installer pull 3853	None	closed	Bug 1710981: Default AWS instance type to 'm5'	2021-02-03 11:20:14 UTC
Red Hat Knowledge Base (Solution)	3314661	None	None	None	2019-09-05 20:18:42 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 15:54:36 UTC

Comment 6 W. Trevor King 2019-05-23 19:40:19 UTC

Seems like AWS isn't yet ready for us to default to m5 in us-east-1 or ap-southeast-2:

  $ AWS_PROFILE=ci aws --region us-east-1 ec2 describe-reserved-instances-offerings --instance-tenancy default --instance-type m5.large --product-description 'Linux/UNIX' --filters Name=scope,Values='Availability Zone' | jq -r '[.ReservedInstancesOfferings[].AvailabilityZone] | sort | unique[]'
  us-east-1a
  us-east-1b
  us-east-1c
  us-east-1d
  us-east-1f

^ missing us-east-1e.

  $ AWS_PROFILE=ci aws --region ap-southeast-2 ec2 describe-reserved-instances-offerings --instance-tenancy default --instance-type m5.large --product-description 'Linux/UNIX' --filters Name=scope,Values='Availability Zone' | jq -r '[.ReservedInstancesOfferings[].AvailabilityZone] | sort | unique[]'
  ap-southeast-2b
  ap-southeast-2c

^ missing ap-southeast-2a.  As far as the CI account has access (e.g. not to us-gov-east-1, etc.), the other regions have m5 in all their zones.  More on per-zone availability in bug 1713157.

Comment 7 Eric Paris 2019-07-08 19:36:38 UTC

There were 2 issues with m5 that made us (consciously) choose to stick with m4.
1. They have a lower EBS attach limit per node (m5 is 28, m4 is ~40)
2. Kube originally at least had trouble always getting the name of the device inside the instance correct.


While (1) seems like it can easily overcome by getting more nodes it actually costs more. Since you have to pay for CPU/memory for the infrastructure pieces (kernel, kubelet, crio, sdn, etc) per node.

If (2) was solved we probably can switch the default after the issues in comment #6 are resolved...

Comment 8 Scott Dodson 2019-09-30 17:19:46 UTC

Deferring this until M5 is more widely supported.

Comment 9 Scott Dodson 2020-01-14 20:34:24 UTC

With regard to availability in us-east-1e or ap-southeast-2 nothing seems to have changed there. We should revisit item #2 from comment 7 and the concerns from comment 6.

Comment 10 Colin Walters 2020-01-15 00:20:10 UTC

> missing us-east-1e.

One thing to remember is that in AWS the zones are randomized per-account https://docs.aws.amazon.com/ram/latest/userguide/working-with-az-ids.html

I suspect that "your" us-east-1e is one of the older AWS zones, possibly the very first, with old hardware.

Some discussion also in https://www.reddit.com/r/aws/comments/9oy2iy/your_requested_instance_type_m5large_is_not/

I think we could try to carry whatever hacks are necessary to identify that AZ and avoid it?

Comment 11 W. Trevor King 2020-01-15 00:25:37 UTC

> I think we could try to carry whatever hacks are necessary to identify that AZ and avoid it?

Vs. just sticking with an image type that is supported across all zones in a region?  I guess that's not future-proof against "AWS adds a new zone with only new hardware".  But as far as I know, there is no API for "what on-demand instance types are available in $ZONE?", which is why we're leaning on reserved-instance queries above.

Comment 12 W. Trevor King 2020-01-15 00:31:33 UTC

> One thing to remember is that in AWS the zones are randomized per-account...

$ AWS_PROFILE=ci aws --region us-east-1 ec2 describe-availability-zones --zone-names us-east-1e | jq -r '.AvailabilityZones[].ZoneId'
use1-az3

So if we wanted to hard-code choices by zone ID, we could do something like that.  I'm not wildly excited about that, though ;).

Comment 14 Andrew Pitt 2020-04-19 12:56:57 UTC

The Canadian region (ca-central-1) *finally* added a 3rd AZ a few weeks ago (ca-central-1d).  This AZ does *not* have "m4" instances.  Anyone who attempts a stock IPI install in ca-central-1 gets the following error:

ERROR Error: Error launching source instance: Unsupported: Your requested instance type (m4.xlarge) is not supported in your requested Availability Zone (ca-central-1d). Please retry your request by not specifying an Availability Zone or choosing ca-central-1a, ca-central-1b. 

Of course, the work around is to edit install-config.yaml and specify m5 instead, but for anyone wanting to install OpenShift 4 with IPI in the Canada region, the error above is likely to be their first installer experience.

Comment 15 Abhinav Dahiya 2020-05-11 17:10:03 UTC

Still waiting for the depend_on bz to be fixed.

Comment 16 Abhinav Dahiya 2020-06-01 19:25:44 UTC

(In reply to Abhinav Dahiya from comment #15)
> Still waiting for the depend_on bz to be fixed.

Still same.

Comment 20 Yunfei Jiang 2020-07-15 06:29:32 UTC

verified. PASS
build: 4.6.0-0.nightly-2020-07-14-035247

Comment 22 errata-xmlrpc 2020-10-27 15:54:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.

aos-bugs
apitt
bbreard
bleanhar
dbaker
dornelas
dustymabe
eparis
erich
imcleod
jeder
jligon
jokerman
kalexand
mmccomas
nstielau
pragshar
sdodson
tsze
walters
wking
yunjiang