Bug 1794755
| Summary: | Timeouts are too short for openshift-baremetal-installer and not adjustable. | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Alexander Chuzhoy <sasha> | |
| Component: | Installer | Assignee: | Stephen Benjamin <stbenjam> | |
| Installer sub component: | OpenShift on Bare Metal IPI | QA Contact: | Raviv Bar-Tal <rbartal> | |
| Status: | CLOSED ERRATA | Docs Contact: | ||
| Severity: | high | |||
| Priority: | high | CC: | agurenko, augol, cpaquin, dtrainor, sgordon, william.caban | |
| Version: | 4.4 | |||
| Target Milestone: | --- | |||
| Target Release: | 4.4.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1803805 (view as bug list) | Environment: | ||
| Last Closed: | 2020-05-04 11:26:40 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1771572, 1803805 | |||
*** Bug 1796204 has been marked as a duplicate of this bug. *** The timeout was updated to 60 Minute. time="2020-02-11T13:45:18Z" level=info msg="Waiting up to 1h0m0s for the cluster at https://api.ocp-edge-cluster.qe.lab.redhat.com:6443 to initialize..." But it is not configurable, I'm closing this BZ and if required will open a new one for timeout configuration Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581 |
Timeouts are too short for openshift-baremetal-installer and not adjustable. Version: 4.4.0-0.nightly-2020-01-23-054055 Running openshift-baremetal-installer on BM often times out. We need to either have longer timeouts or to be able to adjust the timeouts per need. One deployment I tried failed with the following output in the log: time="2020-01-23T18:23:27-05:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.nightly-2020-01-23-054055: 99% complete" time="2020-01-23T18:26:12-05:00" level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, ingress, machine-api, monitoring" time="2020-01-23T18:29:57-05:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.nightly-2020-01-23-054055: 99% complete" time="2020-01-23T18:32:27-05:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.nightly-2020-01-23-054055: 99% complete, waiting on authentication, console, ingress, monitoring" time="2020-01-23T18:35:42-05:00" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.4.0-0.nightly-2020-01-23-054055: 99% complete" time="2020-01-23T18:38:27-05:00" level=debug msg="Still waiting for the cluster to initialize: Some cluster operators are still updating: authentication, console, ingress, monitoring" time="2020-01-23T18:41:37-05:00" level=error msg="Cluster operator authentication Degraded is True with IngressStateEndpoints_MissingSubsets::RouteStatus_FailedHost: IngressStateEndpointsDegraded: No subsets found for the endpoints of oauth-server\nRouteStatusDegraded: route is not available at canonical host oauth-openshift.apps.qe1.kni.lab.eng.bos.redhat.com: []" time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator authentication Progressing is Unknown with NoData: " time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator authentication Available is Unknown with NoData: " time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator console Progressing is True with RouteSyncProgressingFailedHost: RouteSyncProgressing: route is not available at canonical host []" time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator console Available is Unknown with NoData: " time="2020-01-23T18:41:37-05:00" level=error msg="Cluster operator ingress Degraded is True with IngressControllersDegraded: Some ingresscontrollers are degraded: default" time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator ingress Progressing is True with Reconciling: Not all ingress controllers are available.\nMoving to release version \"4.4.0-0.nightly-2020-01-23-054055\".\nMoving to ingress-controller image version \"quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0fe7520a269c7dcd6a1ed69670f5b1796b58117216192bd8a45470bb758e9e5b\"." time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator ingress Available is False with IngressUnavailable: Not all ingress controllers are available." time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator insights Disabled is False with : " time="2020-01-23T18:41:37-05:00" level=error msg="Cluster operator monitoring Degraded is True with UpdatingAlertmanagerFailed: Failed to rollout the stack. Error: running task Updating Alertmanager failed: waiting for Alertmanager Route to become ready failed: waiting for RouteReady of alertmanager-main: no status available for alertmanager-main" time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator monitoring Available is False with : " time="2020-01-23T18:41:37-05:00" level=info msg="Cluster operator monitoring Progressing is True with RollOutInProgress: Rolling out the stack." time="2020-01-23T18:41:37-05:00" level=fatal msg="failed to initialize the cluster: Some cluster operators are still updating: authentication, console, ingress, monitoring" When I came to check the status of the cluster - it was....actually successfully deployed. Another deployment on the same setup failed with the following error in the log: time="2020-01-24T01:52:12-05:00" level=debug msg="module.masters.ironic_node_v1.openshift-master-host[1]: Still creating... [26m20s elapsed]" time="2020-01-24T01:52:22-05:00" level=debug msg="module.masters.ironic_node_v1.openshift-master-host[0]: Still creating... [26m30s elapsed]" time="2020-01-24T01:52:22-05:00" level=debug msg="module.masters.ironic_node_v1.openshift-master-host[2]: Still creating... [26m30s elapsed]" time="2020-01-24T01:52:22-05:00" level=debug msg="module.masters.ironic_node_v1.openshift-master-host[1]: Still creating... [26m30s elapsed]" time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error msg="Error: could not contact API: timeout reached" time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error msg=" on ../../tmp/openshift-install-696469293/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":" time="2020-01-24T01:52:26-05:00" level=error msg=" 1: resource \"ironic_node_v1\" \"openshift-master-host\" {" time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error msg="Error: could not contact API: timeout reached" time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error msg=" on ../../tmp/openshift-install-696469293/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":" time="2020-01-24T01:52:26-05:00" level=error msg=" 1: resource \"ironic_node_v1\" \"openshift-master-host\" {" time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error msg="Error: could not contact API: timeout reached" time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error msg=" on ../../tmp/openshift-install-696469293/masters/main.tf line 1, in resource \"ironic_node_v1\" \"openshift-master-host\":" time="2020-01-24T01:52:26-05:00" level=error msg=" 1: resource \"ironic_node_v1\" \"openshift-master-host\" {" time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=error time="2020-01-24T01:52:26-05:00" level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform" This deployment has really failed. I see that the bootstrap VM is still running, but there are no errors related to starting containers. Can assume it took a long time to pull the container images.