Created attachment 1749596 [details] installation logs Description of problem: Cluster with OCP 4.7 image deployment failed 1/21/2021, 7:40:06 PM error Host worker-0-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 1/21/2021, 7:40:05 PM error Host worker-0-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 1/21/2021, 7:40:05 PM error Host master-0-2: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 1/21/2021, 7:40:05 PM error Host master-0-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 1/21/2021, 7:40:05 PM error Host master-0-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 1/21/2021, 7:39:16 PM critical Failed installing cluster ocp-cluster-f20-h22-0. Reason: Timeout while waiting for cluster version to be available 1/21/2021, 7:38:16 PM Update cluster installation progress: Cluster version is available: false , message: Unable to apply 4.7.0-fc.0: the cluster operator console is degraded Version-Release number of selected component (if applicable): v1.0.15.1 Assisted-ui-lib version: 1.5.4 How reproducible: https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/f584f16f-0199-48ed-9a50-1116b5a71c41 user:nshidlin-aiqe1-u1 password:L7uzs7oUcRJ/SgY4qi9Aupk7u425cFa2 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1749775 [details] installation logs
Reproduced with OCP image 4.6 as well https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/fd380a0c-3a14-4b2a-94b8-7ab0e899ed46 Attached logs files
yobshans Did this happen only on the scale test machines? or did it happen on non-scale scenarios?
current timeout is 2 hours and looks due to lack of resources it just takes much more
(In reply to Igal Tsoiref from comment #5) > current timeout is 2 hours and looks due to lack of resources it just takes > much more we running tests on virt-env according to the minimal requirement specified: Boot the Discovery ISO on hardware that should become part of this bare metal cluster. Hosts connected to the internet will be inspected and automatically appear below. Three master hosts are required with at least 4 CPU cores, 16 GB of RAM, and 20 GB of filesystem storage each. Two or more additional worker hosts are recommended with at least 2 CPU cores, 8 GB of RAM, and 20GB of filesystem storage each.
@ohochman the problem is mainly not vm but host that those vms are running. If you set 4vcpu per vm but host has only 8cpu, it means that on high load some vms will not get cpu at all.
Note: - the issue reproduced with 1 cluster deployed on the same physical host. - QE should attempt to reproduce the issue without CVO as it's been disabled. maybe we need to adjust the requirement - should be discussed with PM.
As @itsoiref said the issue isn't with the VMs spec, the issue is with the host running the VMs.
*** Bug 1889813 has been marked as a duplicate of this bug. ***
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days