Version: 4.11 Platform: ibmcloud Please specify: IPI What happened? IPI deployments on IBM Cloud (x86_64) fail due to the bootstrap VSI failing to start (stuck in Starting or Failed). This appears to be due to the recent change to the Operating System tag for the IBM Cloud VPC Custom Image used for deploying VPC VSI's. https://github.com/openshift/installer/commit/9f339a3a6f34c0498bb137693f4941669945b7e9 From what I can tell, IBM Cloud VPC does not support RHCOS Custom Images properly, which was supposed to have been added recently. DEBUG ibm_is_instance.bootstrap_node: Still creating... [12m40s elapsed] DEBUG ibm_is_instance.bootstrap_node: Still creating... [12m50s elapsed] ERROR ERROR Error: Instance (0787_175bbfd5-1dc9-4695-ac19-4ffc7090a415) went into failed state during the operation ERROR ([ ERROR { ERROR "code": "cannot_start_compute", ERROR "message": "Can't start instance because provisioning failed.", ERROR "more_info": "https://cloud.ibm.com/docs/vpc?topic=vpc-instance-status-messages#cannot-start-compute" ERROR }, ERROR { ERROR "code": "cannot_start_compute", ERROR "message": "Can't start instance because provisioning failed.", ERROR "more_info": "https://cloud.ibm.com/docs/vpc?topic=vpc-instance-status-messages#cannot-start-compute" ERROR } ERROR ]) What did you expect to happen? Successful IPI deployment How to reproduce it (as minimally and precisely as possible)? 100% Create a new IPI 4.11 cluster on IBM Cloud 1. openshift-install create cluster --dir my-ibm-cluster Anything else we need to know? IBM has already created a PR to revert the change that caused this and is working with IBM VPC development to determine the reason why RHCOS Custom Images appear not to work properly.
PR to revert the change that is causing this issue.
A similar issue has been reported on 4.10 use as well, which does not have the OS patch, https://github.com/openshift/installer/commit/9f339a3a6f34c0498bb137693f4941669945b7e9 I will have to continue investigating, in case the 100% failure with the patch above versus 100% success without the patch, happen to coincide with an IBM Cloud VPC issue instead.
Local testing has confirmed the issue affects 4.11 CI/nightly builds, with the OS patch mentioned (RHEL vs. Fedora CoreOS tag). I also have confirmed the latest release-4.10 build is not affected by this bug, as it does not have the OS patch. So I believe this is only affecting 4.11, due to this OS patch, and the PR to revert that change. https://github.com/openshift/installer/pull/5869
pre-merge test done
registry.ci.openshift.org/ocp/release:4.11.0-0.ci-2022-05-10-210344 IPI install success
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069