Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1966623

Summary: [master] Cluster install failed due to timeout while "Waiting for control plane"
Product: OpenShift Container Platform Reporter: Dominik Holler <dholler>
Component: assisted-installerAssignee: Eran Cohen <ercohen>
assisted-installer sub component: assisted-service QA Contact: Yuri Obshansky <yobshans>
Status: CLOSED ERRATA Docs Contact:
Severity: medium    
Priority: high CC: alazar, aos-bugs, ercohen, trwest
Version: 4.7Keywords: Triaged
Target Milestone: ---   
Target Release: 4.8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: AI-Team-Core
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1967631 (view as bug list) Environment:
Last Closed: 2021-08-31 16:17:10 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1967631    
Attachments:
Description Flags
Installation log files none

Description Dominik Holler 2021-06-01 14:16:37 UTC
Created attachment 1788513 [details]
Installation log files

Description of problem:
Cluster installation fails with message:
Cluster has hosts in error.
Reset the installation process to return to the configuration and try again. Some hosts may need to be re-registered by rebooting into the Discovery ISO.



Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install a cluster with 2 workers and 3 masters.


Actual results:
Installation fails.

Expected results:
Installation succeeds.

Additional info:

Comment 1 Eran Cohen 2021-06-01 14:22:47 UTC
This is the event that triggered the failure:

 Host dev-dholler-master-0: updated status from "installing-in-progress" to "error" (Host failed to install because its installation stage Waiting for control plane took longer than expected 1h0m0s)

It was caused (in part) by this issue:

Jun 01 11:16:32 random-hostname-f6cd965e-2ed9-41b1-8ed1-affe1bec1363 installer[13540]: time="2021-06-01T11:16:32Z" level=error msg="Failed to update node installation stage, AssistedServiceError Code: 500 Href:  ID: 500 Kind: Error Reason: Stages Waiting for bootkube isn't available for host role master bootstrap true" request_id=9e45d888-6dfc-42d4-895a-05d9210a4154


Seems that the "Waiting for Bootkube" stage wasn't added to the bootstrap list of stages.

Comment 3 Trey West 2021-06-04 14:20:47 UTC
Verified.

Cluster installations for 4.7.11-x86_64 and 4.8.0-fc.3-x86_64 are succeeding.

Comment 6 errata-xmlrpc 2021-08-31 16:17:10 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.8.9 bug fix), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3247