Hide Forgot
The kube 1.24 rebase did not include some new AWS instance types, namely c7g, i4i, and x2i(e)dn. At the kernel level, these have already been successfully tested with RHEL 8 (bug 2021621, bug 2059264, bug 2047324).
Hi @yselkowi I tried to verify the bug on 4.11.0-0.nightly-2022-06-06-201913. i4i and x2i(e)dn instance type works well. But for c7g, machine created but can only go into Provisioned phase and no node linked to it. Seems related to Architecture: "arm64", as I tried some other instance types whose Architecture is "arm64", all the same, they can only go into Provisioned phase and no node linked to it. Can you please take a look? liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws127n-8kp2h-1-457rr Provisioned c7g.2xlarge us-east-1 us-east-1b 39m huliu-aws127n-8kp2h-2-hk265 Running i4i.xlarge us-east-1 us-east-1b 36m huliu-aws127n-8kp2h-3-nrck2 Running x2iedn.2xlarge us-east-1 us-east-1b 35m huliu-aws127n-8kp2h-4-blcp9 Running x2idn.16xlarge us-east-1 us-east-1b 34m huliu-aws127n-8kp2h-5-l2h5m Provisioned c6g.xlarge us-east-1 us-east-1b 13m huliu-aws127n-8kp2h-master-0 Running m6i.xlarge us-east-1 us-east-1a 68m huliu-aws127n-8kp2h-master-1 Running m6i.xlarge us-east-1 us-east-1b 68m huliu-aws127n-8kp2h-master-2 Running m6i.xlarge us-east-1 us-east-1c 68m huliu-aws127n-8kp2h-worker-us-east-1a-dh725 Running m6i.xlarge us-east-1 us-east-1a 66m huliu-aws127n-8kp2h-worker-us-east-1b-sz2ct Running m6i.xlarge us-east-1 us-east-1b 66m huliu-aws127n-8kp2h-worker-us-east-1c-lqfb2 Running m6i.xlarge us-east-1 us-east-1c 66m liuhuali@Lius-MacBook-Pro huali-test % liuhuali@Lius-MacBook-Pro huali-test % oc logs machine-api-controllers-7f4547f7c4-j2md2 -c machine-controller ... I0609 12:37:39.170552 1 controller.go:175] huliu-aws127n-8kp2h-1-457rr: reconciling Machine I0609 12:37:39.170575 1 actuator.go:107] huliu-aws127n-8kp2h-1-457rr: actuator checking if machine exists I0609 12:37:39.234418 1 reconciler.go:479] huliu-aws127n-8kp2h-1-457rr: Found instance by id: i-099ae65705d5d98dc I0609 12:37:39.234436 1 controller.go:319] huliu-aws127n-8kp2h-1-457rr: reconciling machine triggers idempotent update I0609 12:37:39.234440 1 actuator.go:124] huliu-aws127n-8kp2h-1-457rr: actuator updating machine I0609 12:37:39.234755 1 reconciler.go:176] huliu-aws127n-8kp2h-1-457rr: updating machine I0609 12:37:39.572938 1 reconciler.go:479] huliu-aws127n-8kp2h-1-457rr: Found instance by id: i-099ae65705d5d98dc I0609 12:37:39.572979 1 reconciler.go:233] huliu-aws127n-8kp2h-1-457rr: found 1 running instances for machine I0609 12:37:39.572986 1 reconciler.go:403] huliu-aws127n-8kp2h-1-457rr: ProviderID already set in the machine Spec with value:aws:///us-east-1b/i-099ae65705d5d98dc I0609 12:37:39.573068 1 reconciler.go:263] Updated machine huliu-aws127n-8kp2h-1-457rr I0609 12:37:39.573078 1 machine_scope.go:167] huliu-aws127n-8kp2h-1-457rr: Updating status I0609 12:37:39.754624 1 machine_scope.go:193] huliu-aws127n-8kp2h-1-457rr: finished calculating AWS status I0609 12:37:39.754642 1 machine_scope.go:90] huliu-aws127n-8kp2h-1-457rr: patching machine I0609 12:37:39.771505 1 controller.go:347] huliu-aws127n-8kp2h-1-457rr: has no node yet, requeuing I0609 12:37:57.069402 1 controller.go:175] huliu-aws127n-8kp2h-5-l2h5m: reconciling Machine I0609 12:37:57.069424 1 actuator.go:107] huliu-aws127n-8kp2h-5-l2h5m: actuator checking if machine exists I0609 12:37:57.365953 1 reconciler.go:479] huliu-aws127n-8kp2h-5-l2h5m: Found instance by id: i-0e8e3107a9a3ec7cb I0609 12:37:57.365974 1 controller.go:319] huliu-aws127n-8kp2h-5-l2h5m: reconciling machine triggers idempotent update I0609 12:37:57.365978 1 actuator.go:124] huliu-aws127n-8kp2h-5-l2h5m: actuator updating machine I0609 12:37:57.366383 1 reconciler.go:176] huliu-aws127n-8kp2h-5-l2h5m: updating machine I0609 12:37:57.566008 1 reconciler.go:479] huliu-aws127n-8kp2h-5-l2h5m: Found instance by id: i-0e8e3107a9a3ec7cb I0609 12:37:57.566066 1 reconciler.go:233] huliu-aws127n-8kp2h-5-l2h5m: found 1 running instances for machine I0609 12:37:57.566075 1 reconciler.go:403] huliu-aws127n-8kp2h-5-l2h5m: ProviderID already set in the machine Spec with value:aws:///us-east-1b/i-0e8e3107a9a3ec7cb I0609 12:37:57.566144 1 reconciler.go:263] Updated machine huliu-aws127n-8kp2h-5-l2h5m I0609 12:37:57.566154 1 machine_scope.go:167] huliu-aws127n-8kp2h-5-l2h5m: Updating status I0609 12:37:57.681175 1 machine_scope.go:193] huliu-aws127n-8kp2h-5-l2h5m: finished calculating AWS status I0609 12:37:57.681197 1 machine_scope.go:90] huliu-aws127n-8kp2h-5-l2h5m: patching machine I0609 12:37:57.697727 1 controller.go:347] huliu-aws127n-8kp2h-5-l2h5m: has no node yet, requeuing Must Gather - https://drive.google.com/file/d/1-ZtY3T_IG5RfzzEC0V7wEVySG9rJeSaG/view?usp=sharing
@huliu each architecture currently has its own payload. Therefore, an ARM nightly (published on the arm64 release controller) must be used for testing on ARM infrastructure (such as c7g). We have successfully deployed such a nightly on c7g instances.
Hello huliu, you can use our flexy job with this template[1] to launch an arm cluster to test. https://gitlab.cee.redhat.com/aosqe/flexy-templates/-/blob/master/functionality-testing/aos-4_11/ipi-on-aws/versioned-installer-ovn-arm-ci
Yes, I got it. And I am doing that. When test finished, will upload the test result. Thanks @yselkowi and @lwan
For c7g instance type, verified on 4.11.0-0.nightly-arm64-2022-06-09-060907 liuhuali@Lius-MacBook-Pro huali-test % oc get machine NAME PHASE TYPE REGION ZONE AGE huliu-aws129n-928wx-1-4p925 Running c7g.2xlarge us-east-1 us-east-1b 6m55s huliu-aws129n-928wx-master-0 Running m6g.xlarge us-east-1 us-east-1a 41m huliu-aws129n-928wx-master-1 Running m6g.xlarge us-east-1 us-east-1b 41m huliu-aws129n-928wx-master-2 Running m6g.xlarge us-east-1 us-east-1c 41m huliu-aws129n-928wx-worker-us-east-1a-5nhp6 Running m6g.xlarge us-east-1 us-east-1a 38m huliu-aws129n-928wx-worker-us-east-1b-k9mzf Running m6g.xlarge us-east-1 us-east-1b 38m huliu-aws129n-928wx-worker-us-east-1c-vhtsj Running m6g.xlarge us-east-1 us-east-1c 38m For i4i and x2i(e)dn, already verified in https://bugzilla.redhat.com/show_bug.cgi?id=2091433#c2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069