1919040 – [Assisted][Staging] Cluster deployment failed by Reason: Timeout while waiting for cluster version to be available

Bug 1919040 - [Assisted][Staging] Cluster deployment failed by Reason: Timeout while waiting for cluster version to be available

Summary: [Assisted][Staging] Cluster deployment failed by Reason: Timeout while waitin...

Keywords:
Status:	CLOSED INSUFFICIENT_DATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	assisted-installer
Sub Component:
Version:	4.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Igal Tsoiref
QA Contact:	Udi Kalifon
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1889813 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2021-01-22 00:48 UTC by Yuri Obshansky
Modified:	2023-09-15 00:58 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-16 19:19:14 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
installation logs (95.50 KB, application/x-tar) 2021-01-22 00:48 UTC, Yuri Obshansky	no flags	Details
installation logs (85.50 KB, application/x-tar) 2021-01-22 13:16 UTC, Yuri Obshansky	no flags	Details
View All

Description Yuri Obshansky 2021-01-22 00:48:30 UTC

Created attachment 1749596 [details]
installation logs

Description of problem:
Cluster with OCP 4.7 image deployment failed
1/21/2021, 7:40:06 PM	
error Host worker-0-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install)
1/21/2021, 7:40:05 PM	
error Host worker-0-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install)
1/21/2021, 7:40:05 PM	
error Host master-0-2: updated status from "installed" to "error" (Host is part of a cluster that failed to install)
1/21/2021, 7:40:05 PM	
error Host master-0-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install)
1/21/2021, 7:40:05 PM	
error Host master-0-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install)
1/21/2021, 7:39:16 PM	
critical Failed installing cluster ocp-cluster-f20-h22-0. Reason: Timeout while waiting for cluster version to be available
1/21/2021, 7:38:16 PM	Update cluster installation progress: Cluster version is available: false , message: Unable to apply 4.7.0-fc.0: the cluster operator console is degraded 

Version-Release number of selected component (if applicable):
v1.0.15.1
Assisted-ui-lib version:  1.5.4

How reproducible:
https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/f584f16f-0199-48ed-9a50-1116b5a71c41
user:nshidlin-aiqe1-u1
password:L7uzs7oUcRJ/SgY4qi9Aupk7u425cFa2

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Yuri Obshansky 2021-01-22 13:16:01 UTC

Created attachment 1749775 [details]
installation logs

Comment 2 Yuri Obshansky 2021-01-22 13:16:14 UTC

Reproduced with OCP image 4.6 as well
https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/fd380a0c-3a14-4b2a-94b8-7ab0e899ed46
Attached logs files

Comment 3 Ronnie Lazar 2021-01-24 17:02:31 UTC

yobshans Did this happen only on the scale test machines? or did it happen on non-scale scenarios?

Comment 5 Igal Tsoiref 2021-01-24 17:28:29 UTC

current timeout is 2 hours and looks due to lack of resources it just takes much more

Comment 6 Omri Hochman 2021-01-25 14:14:54 UTC

(In reply to Igal Tsoiref from comment #5)
> current timeout is 2 hours and looks due to lack of resources it just takes
> much more

we running tests on virt-env according to the minimal requirement specified: 
Boot the Discovery ISO on hardware that should become part of this bare metal cluster. Hosts connected to the internet will be inspected and automatically appear below. Three master hosts are required with at least 4 CPU cores, 16 GB of RAM, and 20 GB of filesystem storage each. Two or more additional worker hosts are recommended with at least 2 CPU cores, 8 GB of RAM, and 20GB of filesystem storage each.

Comment 7 Igal Tsoiref 2021-01-25 19:03:55 UTC

@ohochman the problem is mainly not vm but host that those vms are running. If you set 4vcpu per vm but host has only 8cpu, it means that on high load some vms will not get cpu at all.

Comment 8 Omri Hochman 2021-01-26 15:21:27 UTC

Note: 
- the issue reproduced with 1 cluster deployed on the same physical host.  
- QE should attempt to reproduce the issue without CVO as it's been disabled.  
  
maybe we need to adjust the requirement - should be discussed with PM.

Comment 9 Eran Cohen 2021-01-27 08:23:11 UTC

As @itsoiref said the issue isn't with the VMs spec, the issue is with the host running the VMs.

Comment 10 Igal Tsoiref 2021-01-31 20:33:56 UTC

*** Bug 1889813 has been marked as a duplicate of this bug. ***

Comment 11 Red Hat Bugzilla 2023-09-15 00:58:53 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days

Note You need to log in before you can comment on or make changes to this bug.