Bug 2004313 - [RHOCP 4.9.0-rc.0] Failing to deploy Azure cluster from the macOS installer - ignition_bootstrap.ign: no such file or directory
Summary: [RHOCP 4.9.0-rc.0] Failing to deploy Azure cluster from the macOS installer -...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Installer
Version: 4.9
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.10.0
Assignee: aos-install
QA Contact: Amogh Rameshappa Devapura
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-09-15 01:03 UTC by Vincent Lours
Modified: 2022-03-10 16:11 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: There was a race between when terraform would create the local file for the bootstrap ignition config and when terraform would try to upload that file to the Azure storage blob. Consequence: If the upload started before the file was created, then the installation would fail. Fix: Stop having terraform create a local file and upload the bootstrap ignition config file created by the installer manifests directly. Result: Successful installations.
Clone Of:
Environment:
Last Closed: 2022-03-10 16:10:42 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift installer pull 5296 0 None open Bug 2004313: Add explicit dependency for ignition file 2021-10-19 14:01:11 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:11:15 UTC

Comment 2 To Hung Sze 2021-09-22 17:39:12 UTC
I was trying to assign qe to myself and messed up the dev assignment. Sorry.
Please reassign if it isn't set correctly.

Comment 3 To Hung Sze 2021-09-22 23:29:04 UTC
Is it always reproducible and is the error the same every time?
We had seen something similar but can't reproduce it at will.

Comment 4 Matthew Staebler 2021-09-23 15:17:12 UTC
There is a race in the terraform between creating the bootstrap_ignition.ign file and consuming the file to create the Azure storage blob. Since the name of the file is known at plan-creation time, terraform does not determine that the storage blob resource has a dependency on the local file. In the reported case, the local file had not been created by the time it was needed to create the storage blob, which was 10+ seconds after starting to create the file.

Comment 5 Matthew Staebler 2021-09-23 18:42:04 UTC
I am marking this an not a blocker. There is a possibility that a user may encounter this issue, but it is not likely to affect most users. If a user does encounter this, then the user can retry their installation. The installation will fail relatively early. The user will not be left with what they think is a correctly functioning cluster that actually has issues.

Comment 6 Vincent Lours 2021-09-25 01:32:38 UTC
Hi @mstaeble 

Would that mean that running the `openshift-install wait-for bootstrap --dir <path> --log-level debug` command should be enough to get the install process continuing to deploy the cluster?
Or should it be more restarting the `openshift-install create cluster --dir <path> --log-level debug`?

In my case, I think I tried to run the `create cluster` again and that it failed.
If one of the commands should be enough to continue the installation, I can create a KCS to describe the process.

Otherwise, would it be possible to put a test for the file to be present, and if not, can we imagine allowing ~30 sec (in 10-sec segments) to Terraform to ensure the file is created?

Comment 7 Matthew Staebler 2021-09-27 14:05:40 UTC
@Vincent, No, you would need to redo the entire installation: Clean your install directory and run `create cluster` again.

The change that we need to make to the terraform is to create an explicit dependency between the local file and the storage blob. This will tell terraform to wait until the local file is created before attempting to create the storage blob.

Comment 8 Vincent Lours 2021-09-27 22:58:50 UTC
(In reply to Matthew Staebler from comment #7)
> @Vincent, No, you would need to redo the entire installation: Clean your
> install directory and run `create cluster` again.
> 
> The change that we need to make to the terraform is to create an explicit
> dependency between the local file and the storage blob. This will tell
> terraform to wait until the local file is created before attempting to
> create the storage blob.

Thank Matthew for the clarification. I thought I've missed something during the Testathon.
So that is what I tried several times (more than 6 tries in total) with the installers 4.9.0-rc.0 & -rc.1. But unfortunately, that never worked.

That's actually why I raised this BZ.

Good luck with the fix implementation, looking forward to checking the code modifications to understand the issue & solution :)

Comment 15 errata-xmlrpc 2022-03-10 16:10:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.