Created attachment 1768248 [details] installation logs Description of problem: The issue detected under performance test running vs Staging service 6 failed cluster deployments 3/31/2021, 10:13:05 PM error Host worker-2-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/31/2021, 10:13:05 PM error Host master-2-1: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/31/2021, 10:13:05 PM error Host master-2-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/31/2021, 10:13:05 PM error Host master-2-2: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/31/2021, 10:13:05 PM error Host worker-2-0: updated status from "installed" to "error" (Host is part of a cluster that failed to install) 3/31/2021, 10:10:58 PM critical Failed installing cluster ocp-cluster-f34-h18-2. Reason: Timeout while waiting for cluster version to be available 3/31/2021, 9:10:58 PM Update cluster installation progress: Cluster version is available: false , message: Unable to apply 4.7.2: the cluster operator console has not yet successfully rolled out 3/31/2021, 9:07:58 PM Update cluster installation progress: Cluster version is available: false , message: Working towards 4.7.2: 654 of 668 done (97% complete) 3/31/2021, 9:01:59 PM Update cluster installation progress: Cluster version is available: false , message: Unable to apply 4.7.2: some cluster operators have not yet rolled out Version-Release number of selected component (if applicable): v1.0.18.1 How reproducible: https://qaprodauth.cloud.redhat.com/openshift/assisted-installer/clusters/5e46337e-d0c3-4f11-8600-45e8f41671d3 Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Created attachment 1768249 [details] must-gather
Console failed to contact OAuth : 2021-04-01T02:08:33.467259692Z E0401 02:08:33.467186 1 auth.go:235] error contacting auth provider (retrying in 10s): request to OAuth issuer endpoint https://oauth-openshift.apps.ocp-cluster-f34-h18-2.rdu2.scalelab.redhat.com/oauth/token failed: Head "https://oauth-openshift.apps.ocp-cluster-f34-h18-2.rdu2.scalelab.redhat.com": dial tcp 192.168.125.10:443: connect: connection refused Didn't find something to point to.
@slaznick maybe you have an idea why we got connection refused?
The authentication operator is reporting healthy and the pods are running, which means that there exists a route where the connection can be successful. Console most probably got routed improperly, I'd propose start with checking DNS being correct?
@bnemec can it the problem with ingress ip not freed from the first master? It looks the case to say the truth but i am not sure 100%.
Moving this to routing as they might be more helpful then me when it comes to ingresses/DNS. This seems like a platform-dependent bug though.
Please include a must-gather as well as more details about the cluster's platform. Has this been reproduced on a later 4.7.z?
This does sound like https://bugzilla.redhat.com/show_bug.cgi?id=1931505. We haven't been seeing that on 4.7, but I'm not aware of any reason it couldn't happen. We had already planned to backport that fix to 4.7 anyway, so I'm going to mark this as a duplicate. Feel free to reopen if it continues to happen after the backport merges. *** This bug has been marked as a duplicate of bug 1957015 ***
*** This bug has been marked as a duplicate of bug 1957015 ***