Bug 2082604

Summary: [IBMCloud][x86_64] IBM VPC does not properly support RHCOS Custom Image tagging
Product: OpenShift Container Platform Reporter: Christopher J Schaefer <cschaefe>
Component: InstallerAssignee: Nobody <nobody>
Installer sub component: openshift-installer QA Contact: MayXu <maxu>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: unspecified CC: gpei, maxu
Version: 4.11Keywords: TestBlocker
Target Milestone: ---   
Target Release: 4.11.0   
Hardware: x86_64   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-08-10 11:10:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Christopher J Schaefer 2022-05-06 14:20:57 UTC


Please specify:

What happened?
IPI deployments on IBM Cloud (x86_64) fail due to the bootstrap VSI failing to start (stuck in Starting or Failed).

This appears to be due to the recent change to the Operating System tag for the IBM Cloud VPC Custom Image used for deploying VPC VSI's.

From what I can tell, IBM Cloud VPC does not support RHCOS Custom Images properly, which was supposed to have been added recently.

DEBUG ibm_is_instance.bootstrap_node: Still creating... [12m40s elapsed] 
DEBUG ibm_is_instance.bootstrap_node: Still creating... [12m50s elapsed] 
ERROR Error: Instance (0787_175bbfd5-1dc9-4695-ac19-4ffc7090a415) went into failed state during the operation  
ERROR  ([                                          
ERROR     {                                        
ERROR         "code": "cannot_start_compute",      
ERROR         "message": "Can't start instance because provisioning failed.", 
ERROR         "more_info": "https://cloud.ibm.com/docs/vpc?topic=vpc-instance-status-messages#cannot-start-compute" 
ERROR     },                                       
ERROR     {                                        
ERROR         "code": "cannot_start_compute",      
ERROR         "message": "Can't start instance because provisioning failed.", 
ERROR         "more_info": "https://cloud.ibm.com/docs/vpc?topic=vpc-instance-status-messages#cannot-start-compute" 
ERROR     }                                        

What did you expect to happen?
Successful IPI deployment

How to reproduce it (as minimally and precisely as possible)?

Create a new IPI 4.11 cluster on IBM Cloud
1. openshift-install create cluster --dir my-ibm-cluster

Anything else we need to know?
IBM has already created a PR to revert the change that caused this and is working with IBM VPC development to determine the reason why RHCOS Custom Images appear not to work properly.

Comment 1 Christopher J Schaefer 2022-05-06 14:21:50 UTC
PR to revert the change that is causing this issue.

Comment 2 Christopher J Schaefer 2022-05-06 14:47:09 UTC
A similar issue has been reported on 4.10 use as well, which does not have the OS patch, https://github.com/openshift/installer/commit/9f339a3a6f34c0498bb137693f4941669945b7e9

I will have to continue investigating, in case the 100% failure with the patch above versus 100% success without the patch, happen to coincide with an IBM Cloud VPC issue instead.

Comment 3 Christopher J Schaefer 2022-05-06 18:40:03 UTC
Local testing has confirmed the issue affects 4.11 CI/nightly builds, with the OS patch mentioned (RHEL vs. Fedora CoreOS tag).

I also have confirmed the latest release-4.10 build is not affected by this bug, as it does not have the OS patch.

So I believe this is only affecting 4.11, due to this OS patch, and the PR to revert that change.

Comment 4 MayXu 2022-05-10 02:05:53 UTC
pre-merge test done

Comment 6 MayXu 2022-05-11 02:45:55 UTC
registry.ci.openshift.org/ocp/release:4.11.0-0.ci-2022-05-10-210344 IPI install success

Comment 9 errata-xmlrpc 2022-08-10 11:10:37 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.