Bug 2004313

Summary: [RHOCP 4.9.0-rc.0] Failing to deploy Azure cluster from the macOS installer - ignition_bootstrap.ign: no such file or directory
Product: OpenShift Container Platform Reporter: Vincent Lours <vlours>
Component: InstallerAssignee: aos-install
Installer sub component: openshift-installer QA Contact: Amogh Rameshappa Devapura <aramesha>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: medium CC: aramesha, mstaeble, staebler, tsze
Version: 4.9   
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Cause: There was a race between when terraform would create the local file for the bootstrap ignition config and when terraform would try to upload that file to the Azure storage blob. Consequence: If the upload started before the file was created, then the installation would fail. Fix: Stop having terraform create a local file and upload the bootstrap ignition config file created by the installer manifests directly. Result: Successful installations.
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-03-10 16:10:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 2 To Hung Sze 2021-09-22 17:39:12 UTC
I was trying to assign qe to myself and messed up the dev assignment. Sorry.
Please reassign if it isn't set correctly.

Comment 3 To Hung Sze 2021-09-22 23:29:04 UTC
Is it always reproducible and is the error the same every time?
We had seen something similar but can't reproduce it at will.

Comment 4 Matthew Staebler 2021-09-23 15:17:12 UTC
There is a race in the terraform between creating the bootstrap_ignition.ign file and consuming the file to create the Azure storage blob. Since the name of the file is known at plan-creation time, terraform does not determine that the storage blob resource has a dependency on the local file. In the reported case, the local file had not been created by the time it was needed to create the storage blob, which was 10+ seconds after starting to create the file.

Comment 5 Matthew Staebler 2021-09-23 18:42:04 UTC
I am marking this an not a blocker. There is a possibility that a user may encounter this issue, but it is not likely to affect most users. If a user does encounter this, then the user can retry their installation. The installation will fail relatively early. The user will not be left with what they think is a correctly functioning cluster that actually has issues.

Comment 6 Vincent Lours 2021-09-25 01:32:38 UTC
Hi @mstaeble 

Would that mean that running the `openshift-install wait-for bootstrap --dir <path> --log-level debug` command should be enough to get the install process continuing to deploy the cluster?
Or should it be more restarting the `openshift-install create cluster --dir <path> --log-level debug`?

In my case, I think I tried to run the `create cluster` again and that it failed.
If one of the commands should be enough to continue the installation, I can create a KCS to describe the process.

Otherwise, would it be possible to put a test for the file to be present, and if not, can we imagine allowing ~30 sec (in 10-sec segments) to Terraform to ensure the file is created?

Comment 7 Matthew Staebler 2021-09-27 14:05:40 UTC
@Vincent, No, you would need to redo the entire installation: Clean your install directory and run `create cluster` again.

The change that we need to make to the terraform is to create an explicit dependency between the local file and the storage blob. This will tell terraform to wait until the local file is created before attempting to create the storage blob.

Comment 8 Vincent Lours 2021-09-27 22:58:50 UTC
(In reply to Matthew Staebler from comment #7)
> @Vincent, No, you would need to redo the entire installation: Clean your
> install directory and run `create cluster` again.
> 
> The change that we need to make to the terraform is to create an explicit
> dependency between the local file and the storage blob. This will tell
> terraform to wait until the local file is created before attempting to
> create the storage blob.

Thank Matthew for the clarification. I thought I've missed something during the Testathon.
So that is what I tried several times (more than 6 tries in total) with the installers 4.9.0-rc.0 & -rc.1. But unfortunately, that never worked.

That's actually why I raised this BZ.

Good luck with the fix implementation, looking forward to checking the code modifications to understand the issue & solution :)

Comment 15 errata-xmlrpc 2022-03-10 16:10:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056