Created attachment 1787358 [details] example Description of problem: I was installing a cluster of 3 masters, 3 workers. I also chose to deploy OCS and CNV operators. Installation finished for all nodes except worker-0-1 which was hang on stage 7/8. The issue is, that while worker-0-1 was installing, both OCS and CNV operator deployment failed (meaning, operators deployments started before all workers were fully joined the cluster). OCS must have 3 worker nodes, so we need to make sure all workers are available before deploying operators attached must gather, and cluster logs Version-Release number of selected component (if applicable): Staging v1.0.20.3 OCS 4.7 Steps to Reproduce: 1. install OCS CNV 3m 3w cluster 2. During installation, simulate a failure in 1 worker node (i.e kill installer) 3. wait for cluster to complete Actual results: operators deployment kicked and failed before all workers finished installation Expected results: wait for all workers to be installed before deploying olm operators Additional info: 1. No insights regarding why this worker failed to deploy 2. Seems like console and CVO operators kicked off before the entire cluster is done. 2 minutes after they were installed successfully, OCS and CNV failed: CVO status_updated_at::2021-05-26T13:32:08.455Z OCS status_updated_at: 2021-05-26T13:34:40.289Z CNV status_updated_at: 2021-05-26T13:34:40.041Z 5/26/2021, 5:16:42 PM error Host worker-0-1: updated status from "installing-in-progress" to "error" (Host failed to install because its installation stage Joined took longer than expected 1h0m0s) 5/26/2021, 4:34:47 PM Successfully finished installing cluster edge34-cluster-cnv-ocs-0 5/26/2021, 4:32:08 PM Cluster version status: available message: Done applying 4.7.9 5/26/2021, 4:24:08 PM Cluster version status: progressing message: Unable to apply 4.7.9: the cluster operator authentication has not yet successfully rolled out 5/26/2021, 4:21:08 PM Cluster version status: progressing message: Unable to apply 4.7.9: some cluster operators have not yet rolled out 5/26/2021, 4:19:47 PM Updated status of cluster edge34-cluster-cnv-ocs-0 to finalizing
Created attachment 1787360 [details] cluster_logs
Created attachment 1787373 [details] must-gather
This bug should be fixed by changes done for https://issues.redhat.com/browse/MGMT-4668
The epic mentioned in the comment #3 is Done. Please retest to make sure it works now.
will be verified, please move the bug to ON_QA and add fix_in_version
Verified in Staging, v1.0.21.3
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438