Created attachment 1814560 [details] .openshift_install.log Created attachment 1814560 [details] .openshift_install.log Version: $ openshift-install version openshift-install 4.9.0-0.nightly-2021-08-16-082143 built from commit c7d810f497d0c6c3ad22e5c14f873a70b0586231 release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:89aa37cfa85591440b3b099fed4cde52329308425cc5803c864eefbf7ce9e265 Platform: aws ARM64 What happened? openshift-install failed when trying to install on AWS ARM64 nodes. The AWS interface shows all 3 master nodes running but no worker. Looking at the debug output, the master nodes got created but were unreachable on port 6443. What did you expect to happen? Install to create all nodes and finish successfully How to reproduce it (as minimally and precisely as possible)? $ mkdir ocp $ cat > ocp/install-config.yaml <<EOF apiVersion: v1 baseDomain: devcluster.openshift.com compute: - architecture: arm64 hyperthreading: Enabled name: worker platform: aws: type: m6g.xlarge replicas: 3 controlPlane: architecture: arm64 hyperthreading: Enabled name: master platform: aws: type: m6g.xlarge replicas: 3 metadata: creationTimestamp: null name: jed networking: clusterNetwork: - cidr: 10.128.0.0/14 hostPrefix: 23 machineNetwork: - cidr: 10.0.0.0/16 networkType: OpenShiftSDN serviceNetwork: - 172.30.0.0/16 platform: aws: region: us-east-1 publish: External pullSecret: <REDACTED> EOF $ openshift-install --log-level debug --dir ocp create cluster
Reproducing the error with the same installer version give the following in the release-image systemd service log: Pulling quay.io/openshift-release-dev/ocp-release-nightly@sha256:89aa37cfa85591440b3b099fed4cde52329308425cc5803c864eefbf7ce9e265... Aug 17 05:37:26 ip-10-0-10-205 release-image-download.sh[1533]: 90e82a591baa01bf736d167f9e18f39c0148aa736f8e618ddb7be478131de674 Aug 17 05:37:27 ip-10-0-10-205 release-image-download.sh[1533]: ERROR: release image arch amd64 does not match host arch arm64 Aug 17 05:37:27 ip-10-0-10-205 systemd[1]: release-image.service: Main process exited, code=exited, status=1/FAILURE Aug 17 05:37:27 ip-10-0-10-205 systemd[1]: release-image.service: Failed with result 'exit-code'. Aug 17 05:37:27 ip-10-0-10-205 systemd[1]: Failed to start Download the OpenShift Release Image. So the image you are deploying, quay.io/openshift-release-dev/ocp-release-nightly@sha256:89aa37cfa85591440b3b099fed4cde52329308425cc5803c864eefbf7ce9e265, is for amd64 platforms. $ podman image inspect quay.io/openshift-release-dev/ocp-release-nightly@sha256:89aa37cfa85591440b3b099fed4cde52329308425cc5803c864eefbf7ce9e265 | grep Architecture "Architecture": "amd64", In order to deploy on ARM64 with a non-arm64 related installer, you can set OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE to the arm64 release image you'd install. As an example: $ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=quay.io/openshift-release-dev/ocp-release-nightly:4.9.0-0.nightly-arm64-2021-08-16-154214 $ podman image inspect quay.io/openshift-release-dev/ocp-release-nightly:4.9.0-0.nightly-arm64-2021-08-16-154214 | grep Architecture "Architecture": "arm64", $ ./openshift create cluster --dir ocp By reproducing your steps and environment, but setting the arm64 image, all is working fine. Finally, in order to install on ARM64, one can use installer binaries from https://mirror.openshift.com/pub/openshift-v4/aarch64/clients/ocp-dev-preview/ If you're on amd64 and want to install for arm64, you can download download one of the openshift-install-linux-amd64-4.9.0-0.nightly-arm64-.*.tar.gz Then, your installer will be (1) built for amd64 platform and (2) linked to arm64 images by default: ./openshift-install version ./openshift-install 4.9.0-0.nightly-arm64-2021-08-16-154214 built from commit c7d810f497d0c6c3ad22e5c14f873a70b0586231 release image quay.io/openshift-release-dev/ocp-release-nightly@sha256:f5507d0e00a653c4e4a1333ca8649d52609d85c47270ace3dbc824f6a4d6de1b podman image inspect quay.io/openshift-release-dev/ocp-release-nightly@sha256:f5507d0e00a653c4e4a1333ca8649d52609d85c47270ace3dbc824f6a4d6de1b | grep Architecture "Architecture": "arm64" However: 1. The openshift-install could also validate, before the deployment, the compatibility between the target platform and the images being used (this has to be tracked in another issue) 2. We could have a way to provide one only installer that is able to gather the correct image to be used based on the controlPlane.architecture and compute.architecture fields in the install-config.yaml. This, maybe, would need changes on CI current architecture and registry artifact, image streams and imagestreamtags that are being used. @psundararaman Got some of your code for the installer on Github and tasks related to libvirt on Jira. Do you have any information about point 2?
Ah, user error! Thanks a lot for the detailed explanation. I agree with point 1, getting a meaningful error would be nice. Point 2 would be amazing!
Yes, the openshift-installer with ARM payload can be downloaded here: https://console.redhat.com/openshift/install/aws/arm The payload does not dynamically vary based on the architecture specified in the install-config.yaml. Instead, the openshift-install is built for specific payloads for all arches. So, you can use the x86 openshift-installer binary from the above link and it would give you an ARM payload. This is useful if you want to provision an ARM AWS cluster from your laptop which is x86. To point 1 - the openshift-installer cannot validate that the arch specified in the install-config matches the arch specified in the payload. For that to happen, the payload needs to be downloaded and inspected and then the arch needs to be compared which is what is happening at bootstrap phase with the error that you see in the systemd unit which clearly indicates the problem. This route was already investigated and was not pursued because typically people deploying OCP do not build their own installer. they download it from the tile page.