Bug 2091433

Summary: Update AWS instance types
Product: OpenShift Container Platform Reporter: Yaakov Selkowitz <yselkowi>
Component: Cloud ComputeAssignee: Yaakov Selkowitz <yselkowi>
Cloud Compute sub component: Cluster Autoscaler QA Contact: Huali Liu <huliu>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: huliu, lwan
Version: 4.11Flags: huliu: needinfo-
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:14:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yaakov Selkowitz 2022-05-29 20:41:04 UTC
The kube 1.24 rebase did not include some new AWS instance types, namely c7g, i4i, and x2i(e)dn.  At the kernel level, these have already been successfully tested with RHEL 8 (bug 2021621, bug 2059264, bug 2047324).

Comment 2 Huali Liu 2022-06-09 12:42:58 UTC
Hi @yselkowi I tried to verify the bug on 4.11.0-0.nightly-2022-06-06-201913.

i4i and x2i(e)dn instance type works well.

But for c7g, machine created but can only go into Provisioned phase and no node linked to it. Seems related to Architecture: "arm64", as I tried some other instance types whose Architecture is "arm64", all the same, they can only go into Provisioned phase and no node linked to it. Can you please take a look?

liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                          PHASE         TYPE             REGION      ZONE         AGE
huliu-aws127n-8kp2h-1-457rr                   Provisioned   c7g.2xlarge      us-east-1   us-east-1b   39m
huliu-aws127n-8kp2h-2-hk265                   Running       i4i.xlarge       us-east-1   us-east-1b   36m
huliu-aws127n-8kp2h-3-nrck2                   Running       x2iedn.2xlarge   us-east-1   us-east-1b   35m
huliu-aws127n-8kp2h-4-blcp9                   Running       x2idn.16xlarge   us-east-1   us-east-1b   34m
huliu-aws127n-8kp2h-5-l2h5m                   Provisioned   c6g.xlarge       us-east-1   us-east-1b   13m
huliu-aws127n-8kp2h-master-0                  Running       m6i.xlarge       us-east-1   us-east-1a   68m
huliu-aws127n-8kp2h-master-1                  Running       m6i.xlarge       us-east-1   us-east-1b   68m
huliu-aws127n-8kp2h-master-2                  Running       m6i.xlarge       us-east-1   us-east-1c   68m
huliu-aws127n-8kp2h-worker-us-east-1a-dh725   Running       m6i.xlarge       us-east-1   us-east-1a   66m
huliu-aws127n-8kp2h-worker-us-east-1b-sz2ct   Running       m6i.xlarge       us-east-1   us-east-1b   66m
huliu-aws127n-8kp2h-worker-us-east-1c-lqfb2   Running       m6i.xlarge       us-east-1   us-east-1c   66m
liuhuali@Lius-MacBook-Pro huali-test % 

liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-7f4547f7c4-j2md2 -c machine-controller
...
I0609 12:37:39.170552       1 controller.go:175] huliu-aws127n-8kp2h-1-457rr: reconciling Machine
I0609 12:37:39.170575       1 actuator.go:107] huliu-aws127n-8kp2h-1-457rr: actuator checking if machine exists
I0609 12:37:39.234418       1 reconciler.go:479] huliu-aws127n-8kp2h-1-457rr: Found instance by id: i-099ae65705d5d98dc
I0609 12:37:39.234436       1 controller.go:319] huliu-aws127n-8kp2h-1-457rr: reconciling machine triggers idempotent update
I0609 12:37:39.234440       1 actuator.go:124] huliu-aws127n-8kp2h-1-457rr: actuator updating machine
I0609 12:37:39.234755       1 reconciler.go:176] huliu-aws127n-8kp2h-1-457rr: updating machine
I0609 12:37:39.572938       1 reconciler.go:479] huliu-aws127n-8kp2h-1-457rr: Found instance by id: i-099ae65705d5d98dc
I0609 12:37:39.572979       1 reconciler.go:233] huliu-aws127n-8kp2h-1-457rr: found 1 running instances for machine
I0609 12:37:39.572986       1 reconciler.go:403] huliu-aws127n-8kp2h-1-457rr: ProviderID already set in the machine Spec with value:aws:///us-east-1b/i-099ae65705d5d98dc
I0609 12:37:39.573068       1 reconciler.go:263] Updated machine huliu-aws127n-8kp2h-1-457rr
I0609 12:37:39.573078       1 machine_scope.go:167] huliu-aws127n-8kp2h-1-457rr: Updating status
I0609 12:37:39.754624       1 machine_scope.go:193] huliu-aws127n-8kp2h-1-457rr: finished calculating AWS status
I0609 12:37:39.754642       1 machine_scope.go:90] huliu-aws127n-8kp2h-1-457rr: patching machine
I0609 12:37:39.771505       1 controller.go:347] huliu-aws127n-8kp2h-1-457rr: has no node yet, requeuing
I0609 12:37:57.069402       1 controller.go:175] huliu-aws127n-8kp2h-5-l2h5m: reconciling Machine
I0609 12:37:57.069424       1 actuator.go:107] huliu-aws127n-8kp2h-5-l2h5m: actuator checking if machine exists
I0609 12:37:57.365953       1 reconciler.go:479] huliu-aws127n-8kp2h-5-l2h5m: Found instance by id: i-0e8e3107a9a3ec7cb
I0609 12:37:57.365974       1 controller.go:319] huliu-aws127n-8kp2h-5-l2h5m: reconciling machine triggers idempotent update
I0609 12:37:57.365978       1 actuator.go:124] huliu-aws127n-8kp2h-5-l2h5m: actuator updating machine
I0609 12:37:57.366383       1 reconciler.go:176] huliu-aws127n-8kp2h-5-l2h5m: updating machine
I0609 12:37:57.566008       1 reconciler.go:479] huliu-aws127n-8kp2h-5-l2h5m: Found instance by id: i-0e8e3107a9a3ec7cb
I0609 12:37:57.566066       1 reconciler.go:233] huliu-aws127n-8kp2h-5-l2h5m: found 1 running instances for machine
I0609 12:37:57.566075       1 reconciler.go:403] huliu-aws127n-8kp2h-5-l2h5m: ProviderID already set in the machine Spec with value:aws:///us-east-1b/i-0e8e3107a9a3ec7cb
I0609 12:37:57.566144       1 reconciler.go:263] Updated machine huliu-aws127n-8kp2h-5-l2h5m
I0609 12:37:57.566154       1 machine_scope.go:167] huliu-aws127n-8kp2h-5-l2h5m: Updating status
I0609 12:37:57.681175       1 machine_scope.go:193] huliu-aws127n-8kp2h-5-l2h5m: finished calculating AWS status
I0609 12:37:57.681197       1 machine_scope.go:90] huliu-aws127n-8kp2h-5-l2h5m: patching machine
I0609 12:37:57.697727       1 controller.go:347] huliu-aws127n-8kp2h-5-l2h5m: has no node yet, requeuing

Must Gather - https://drive.google.com/file/d/1-ZtY3T_IG5RfzzEC0V7wEVySG9rJeSaG/view?usp=sharing

Comment 3 Yaakov Selkowitz 2022-06-09 13:56:03 UTC
@huliu each architecture currently has its own payload.  Therefore, an ARM nightly (published on the arm64 release controller) must be used for testing on ARM infrastructure (such as c7g).  We have successfully deployed such a nightly on c7g instances.

Comment 4 wang lin 2022-06-10 01:24:26 UTC
Hello huliu, you can use our flexy job with this template[1] to launch an arm cluster to test.

https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_11/ipi-on-aws/versioned-installer-ovn-arm-ci

Comment 5 Huali Liu 2022-06-10 01:34:32 UTC
Yes, I got it. And I am doing that. When test finished, will upload the test result. Thanks @yselkowi and @lwan

Comment 6 Huali Liu 2022-06-10 01:52:33 UTC
For c7g instance type, verified on 4.11.0-0.nightly-arm64-2022-06-09-060907
liuhuali@Lius-MacBook-Pro huali-test % oc get machine
NAME                                          PHASE         TYPE          REGION      ZONE         AGE
huliu-aws129n-928wx-1-4p925                   Running       c7g.2xlarge   us-east-1   us-east-1b   6m55s
huliu-aws129n-928wx-master-0                  Running       m6g.xlarge    us-east-1   us-east-1a   41m
huliu-aws129n-928wx-master-1                  Running       m6g.xlarge    us-east-1   us-east-1b   41m
huliu-aws129n-928wx-master-2                  Running       m6g.xlarge    us-east-1   us-east-1c   41m
huliu-aws129n-928wx-worker-us-east-1a-5nhp6   Running       m6g.xlarge    us-east-1   us-east-1a   38m
huliu-aws129n-928wx-worker-us-east-1b-k9mzf   Running       m6g.xlarge    us-east-1   us-east-1b   38m
huliu-aws129n-928wx-worker-us-east-1c-vhtsj   Running       m6g.xlarge    us-east-1   us-east-1c   38m

For i4i and x2i(e)dn, already verified in https://bugzilla.redhat.com/show_bug.cgi?id=2091433#c2

Comment 8 errata-xmlrpc 2022-08-10 11:14:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069