Created attachment 1760681 [details] installation logs Description of problem: Assisted Service on Staging env Cluster Events: 3/3/2021, 6:04:35 PM error Host worker-0-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/3/2021, 6:04:35 PM error Host master-0-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/3/2021, 6:04:35 PM error Host master-0-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/3/2021, 6:04:35 PM error Host worker-0-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/3/2021, 6:04:35 PM error Host master-0-2: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/3/2021, 6:03:09 PM critical Failed installing cluster ocp-cluster-f13-h05-0. Reason: Timeout while waiting for console to become available 3/3/2021, 4:52:07 PM Updated status of cluster ocp-cluster-f13-h05-0 to finalizing Version-Release number of selected component (if applicable): v1.0.17.1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
@itsoiref @ronnie.lazar Another failed cluster deployment https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/64c67213-46ea-4dcb-b675-8fd6faed3148 with the same error 3/11/2021, 2:30:53 PM error Host master-2-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/11/2021, 2:30:52 PM error Host master-2-2: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/11/2021, 2:30:52 PM error Host master-2-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/11/2021, 2:30:52 PM error Host worker-2-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/11/2021, 2:30:52 PM error Host worker-2-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/11/2021, 2:30:48 PM critical Failed installing cluster ocp-cluster-f13-h06-2. Reason: Timeout while waiting for console to become available I was able to run must-gather and sos reports for nodes this time. See attachements
Created attachment 1762827 [details] must-gather
Created attachment 1762829 [details] sos-report master 0
Created attachment 1762830 [details] sos-report master 1
Created attachment 1762831 [details] sos-report master 2
Created attachment 1762832 [details] sos-report worker 0
Created attachment 1762833 [details] sos-report worker 1
Created attachment 1762835 [details] installation logs
@yobshans sos-reports are from another installation. There is another ip inside those vms. There is a way for sos-report of this run?
On master-2-0 console pod didn't start cause of E0311 19:31:38.279726 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-f13-h06-2.rdu2.scalelab.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.ocp-cluster-f13-h06-2.rdu2.scalelab.redhat.com": dial tcp 192.168.125.10:443: connect: connection refused On 2 other masters everything looks ok. Didn't find any errors relevant to it.
192.168.125.10 is ingress ip. Console is stuck in progressing Progressing True 2021-03-11 18:24:32 +0000 UTC SyncLoopRefresh_InProgress SyncLoopRefreshProgressing: Working toward version 4.7.0} {Available False 2021-03-11 18:14:39 +0000 UTC Deployment_FailedUpdate DeploymentAvailable: 2 replicas ready at version 4.7.0} {Upgradeable True 2021-03-11 18:10:49 +0000 UTC AsExpected All is well We can see that in deployment there are 2 Ready replicas from 2. Why there is 3ird replica? rollout?
@itsoiref @ronnie.lazar I'm adding new logs from the last failure https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/59167322-0675-41ff-b60b-4a8f7bbcef1d The environment is running. Ping me on slack to get access.
Created attachment 1765899 [details] NEW must-gather
Created attachment 1765900 [details] NEW sos-report master 0
Created attachment 1765901 [details] NEW sos-report master 1
Created attachment 1765902 [details] NEW sos-report master 2
Created attachment 1765903 [details] NEW installation logs
It looks like some env issue to say the truth. Everything looks up and running but there is a problem in console and ingress operator canary check. They got connection refused why trying to reach ingress_vip:443. Sounds like something is dropping there calls. Or vip somehow is not configured on worker(though looks like yes) or there is some external LB that is doing something wrong. @yobshans i will be very glad to connect to setup if possible. Regarding sos reports will be nice if we will use one with networking commands and workers sos are nice to have too cause ingress always runs on worker
We found a reason for current failure, it is caused by https://bugzilla.redhat.com/show_bug.cgi?id=1931505
*** This bug has been marked as a duplicate of bug 1931505 ***