Bug 1878374 - [OpenShift][AWS] Provide list of officially supported instance types, especially considering AMD based instances
Summary: [OpenShift][AWS] Provide list of officially supported instance types, especia...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Documentation
Version: 4.5
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.7.z
Assignee: Kathryn Alexander
QA Contact: Yunfei Jiang
Vikram Goyal
URL:
Whiteboard:
Depends On:
Blocks: 1974738 1974775
TreeView+ depends on / blocked
 
Reported: 2020-09-12 14:47 UTC by Andreas Karis
Modified: 2023-12-15 19:18 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1974738 (view as bug list)
Environment:
Last Closed: 2021-06-10 12:42:00 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 4327 0 None closed Bug 1878374: Adding more nitro and the AMD instance types (AWS, UPI) 2021-02-19 23:03:14 UTC
Github openshift openshift-docs pull 27162 0 None closed BZ1878374 specifying supported instance types 2021-06-07 14:49:39 UTC
Red Hat Knowledge Base (Solution) 5393561 0 None None None 2020-09-12 15:02:09 UTC

Description Andreas Karis 2020-09-12 14:47:42 UTC
Document URL: 
https://docs.openshift.com/container-platform/4.5/installing/installing_aws/installing-aws-user-infra.html#installation-aws-user-infra-cluster-machines_installing-aws-user-infra

Section Number and Name: 

Describe the issue: 

Suggestions for improvement: 

Additional information: 

Please provide a list of officially supported instance type for both CloudFormation templates based installation and IPI installation

Please clarify if the AMD based machine types m5a / r5a are supported:
https://aws.amazon.com/blogs/aws/new-lower-cost-amd-powered-ec2-instances/

At the moment, IPI has absolutely no indication whether other instance types than the default are supported.

And the CloudFormation templates guide contains only Intel based instance types.

Comment 1 Andreas Karis 2020-09-12 14:53:06 UTC
These machines definitely can be spawned and seem to run fine from a quick verification:
~~~
[akaris@linux ipi-eu-central-1]$ oc get machineset -A | egrep 'm5a|r5a'
openshift-machine-api   akaris3-cpwvz-worker-eu-central-m5a-1a   1         1         1       1           63m
openshift-machine-api   akaris3-cpwvz-worker-eu-central-r5a-1a   1         1         1       1           62m

[akaris@linux ipi-eu-central-1]$ oc describe machineset -n openshift-machine-api akaris3-cpwvz-worker-eu-central-m5a-1a | grep m5a
Name:         akaris3-cpwvz-worker-eu-central-m5a-1a
  Self Link:         /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/akaris3-cpwvz-worker-eu-central-m5a-1a
      machine.openshift.io/cluster-api-machineset:  akaris3-cpwvz-worker-eu-central-m5a-1a
        machine.openshift.io/cluster-api-machineset:    akaris3-cpwvz-worker-eu-central-m5a-1a
          Instance Type:  m5a.large
[akaris@linux ipi-eu-central-1]$ oc describe machineset -n openshift-machine-api akaris3-cpwvz-worker-eu-central-r5a-1a | grep r5a
Name:         akaris3-cpwvz-worker-eu-central-r5a-1a
  Self Link:         /apis/machine.openshift.io/v1beta1/namespaces/openshift-machine-api/machinesets/akaris3-cpwvz-worker-eu-central-r5a-1a
      machine.openshift.io/cluster-api-machineset:  akaris3-cpwvz-worker-eu-central-r5a-1a
        machine.openshift.io/cluster-api-machineset:    akaris3-cpwvz-worker-eu-central-r5a-1a
          Instance Type:  r5a.large
~~~

~~~
[akaris@linux ipi-eu-central-1]$ oc get nodes
NAME                                            STATUS   ROLES    AGE   VERSION
ip-10-0-132-145.eu-central-1.compute.internal   Ready    worker   25h   v1.18.3+6c42de8
ip-10-0-133-148.eu-central-1.compute.internal   Ready    worker   58m   v1.18.3+6c42de8
ip-10-0-135-35.eu-central-1.compute.internal    Ready    worker   57m   v1.18.3+6c42de8
ip-10-0-141-123.eu-central-1.compute.internal   Ready    master   26h   v1.18.3+6c42de8
ip-10-0-169-101.eu-central-1.compute.internal   Ready    worker   25h   v1.18.3+6c42de8
ip-10-0-184-159.eu-central-1.compute.internal   Ready    master   26h   v1.18.3+6c42de8
ip-10-0-209-54.eu-central-1.compute.internal    Ready    master   26h   v1.18.3+6c42de8
ip-10-0-219-97.eu-central-1.compute.internal    Ready    worker   25h   v1.18.3+6c42de8
[akaris@linux ipi-eu-central-1]$ oc debug node/ip-10-0-135-35.eu-central-1.compute.internal
(...)
sh-4.4# lscpu
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7571
Stepping:            2
CPU MHz:             2546.307
BogoMIPS:            4399.76
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
sh-4.4# free -g
              total        used        free      shared  buff/cache   available
Mem:             15           0          11           0           2          14
Swap:             0           0           0
~~~

~~~
[akaris@linux ipi-eu-central-1]$ oc debug node/ip-10-0-133-148.eu-central-1.compute.internal
Starting pod/ip-10-0-133-148eu-central-1computeinternal-debug ...
To use host binaries, run `chroot /host`
chroot /host
lscpu 
free -g
Pod IP: 10.0.133.148
If you don't see a command prompt, try pressing enter.
chroot /host
sh-4.4# lscpu 
Architecture:        x86_64
CPU op-mode(s):      32-bit, 64-bit
Byte Order:          Little Endian
CPU(s):              2
On-line CPU(s) list: 0,1
Thread(s) per core:  2
Core(s) per socket:  1
Socket(s):           1
NUMA node(s):        1
Vendor ID:           AuthenticAMD
CPU family:          23
Model:               1
Model name:          AMD EPYC 7571
Stepping:            2
CPU MHz:             2544.890
BogoMIPS:            4399.70
Hypervisor vendor:   KVM
Virtualization type: full
L1d cache:           32K
L1i cache:           64K
L2 cache:            512K
L3 cache:            8192K
NUMA node0 CPU(s):   0,1
Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm cmp_legacy cr8_legacy abm sse4a misalignsse 3dnowprefetch topoext vmmcall fsgsbase bmi1 avx2 smep bmi2 rdseed adx smap clflushopt sha_ni xsaveopt xsavec xgetbv1 clzero xsaveerptr arat npt nrip_save
sh-4.4# free -g
              total        used        free      shared  buff/cache   available
Mem:              7           0           4           0           2           6
Swap:             0           0           0
~~~

But the question is if we support them.

Thanks!

Comment 2 David Hernández Fernández 2020-09-21 09:47:00 UTC
For reference, I created a generic KCS for support of the instances types and generic issues: https://access.redhat.com/solutions/5418671

Comment 3 Andreas Karis 2020-09-29 08:29:23 UTC
Hi,

Thanks. While https://access.redhat.com/solutions/5418671 contains some explanations, that KCS though doesn't answer if we specifically allow installation on AMD based systems as asked / verified in https://access.redhat.com/solutions/5393561

From your KCS:
>> Those are only specifying the minimum requirements and mostly all instance types are supported to be used. You might be able to use other instance types that meet the specifications of these instance types.

That's not really a clear support statement and is something that we might shoot ourselves in the foot with. Shouldn't all / most supported systems have gone through our QA/QE process?

Raising the priority of this in the meantime as I need a clear support statement for these AMD based m5a and r5a instance types.

- Andreas

Comment 5 Andreas Karis 2020-09-29 13:53:41 UTC
Red Hat does not document, test or explicitly support other instance types other than the ones listed in the documentation.

Technically, specific other instance types may work out of the box. However, if users run into issues and Red Hat Technical Support suspect that a particular issue might be specific to these instance types, Red Hat Technical Support can request to reproduce the same issue with one of the supported instance types.

For further details, see:

    https://docs.openshift.com/container-platform/4.5/installing/installing_aws/installing-aws-user-infra.html#installation-aws-user-infra-cluster-machines_installing-aws-user-infra
    https://access.redhat.com/solutions/5418671

Comment 9 Andreas Karis 2020-10-02 12:59:23 UTC
Let me rephrase this bug report / request then:

Can we make sure that information about supported instance types is in a central location within the OpenShift on AWS docs? So that it can be read / accessed from both the IPI and UPI guides? Can we provide a clear statement that non-listed instance types are not supported?

Thanks,

Andreas

Comment 16 Kathryn Alexander 2020-11-06 19:38:34 UTC
The PR is here: https://github.com/openshift/openshift-docs/pull/27162

Yunfei, will you PTAL?

Comment 22 Yunfei Jiang 2021-05-14 08:05:51 UTC
Hello Kathryn, added some comments in PR, PTAL, thanks.

Comment 23 Kathryn Alexander 2021-05-14 12:36:58 UTC
I've made the updates. Will you PTAL?

Comment 24 Yunfei Jiang 2021-05-20 03:00:56 UTC
LGTM.

Comment 25 Kathryn Alexander 2021-05-20 17:44:51 UTC
I've merged this change and am waiting for it to go live. Thanks!

Comment 28 Kathryn Alexander 2021-06-09 17:48:16 UTC
Yunfei, will you please confirm this PR? https://github.com/openshift/openshift-docs/pull/33261


Note You need to log in before you can comment on or make changes to this bug.