Hide Forgot
Description of problem: I am getting back to back failures trying to install an OCP 4.0 cluster on AWS with payload "4.0.0-0.nightly-2019-02-20-194410", which is marked "Accepted" on "https://openshift-release.svc.ci.openshift.org/": The 3 master and 3 worker nodes get created but installer does not complete and bails with Error: time="2019-02-21T16:09:38Z" level=debug msg="Destroy complete! Resources: 11 destroyed." time="2019-02-21T16:09:38Z" level=info msg="Waiting up to 30m0s for the cluster to initialize..." time="2019-02-21T16:09:38Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:09:51Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:10:06Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:10:21Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:11:36Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:12:36Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:12:51Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:13:06Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:13:21Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:15:34Z" level=debug msg="Still waiting for the cluster to initialize..." time="2019-02-21T16:16:06Z" level=debug msg="Still waiting for the cluster to initialize: Cluster operator monitoring is reporting a failure: Failed to rollout the stack. Error: running task Updating Grafana failed: waiting for Grafana Route to become ready failed: waiting for RouteReady of grafana: an error on the server (\"Internal Server Error: \\\"/apis/route.openshift.io/v1/namespaces/openshift-monitoring/routes/grafana\\\": Post https://172.30.0.1:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews: dial tcp 172.30.0.1:443: connect: connection refused\") has prevented the request from succeeding (get routes.route.openshift.io grafana)" Version-Release number of the following components: # ./openshift-install version ./openshift-install v4.0.0-0.177.0.1-dirty payload: 4.0.0-0.nightly-2019-02-20-194410 How reproducible: Twice in row with destroy cluster after each failed install Steps to Reproduce: 1. Create a new OCP 4.0 cluster with nightly payload: 2. export BUILD_VERSION=4.0.0-0.nightly-2019-02-20-194410 3. oc adm release info --pullspecs registry.svc.ci.openshift.org/ocp/release:${BUILD_VERSION} | grep installer 4. export CONTAINER_ID=$(docker create quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:5ccc80b91aad3a03440d784a09f21ea89d7007a73b9f01210d6c5925340b2650) 5. mkdir installer_4.0.0-0.nightly-2019-02-20-194410 6. cd installer_4.0.0-0.nightly-2019-02-20-194410 7. docker cp $CONTAINER_ID:/usr/bin/openshift-install . 8. export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-20-194410 9. export _OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-20-194410 10. ./openshift-install --dir=$(pwd) create cluster --log-level=debug Actual results: 3 masters and 3 worker nodes get created but installer does not complete and bails with error: "Still waiting for the cluster to initialize: Cluster operator monitoring is reporting a failure: Failed to rollout the stack. Error: running task Updating Grafana failed: waiting for Grafana Route to become ready failed: waiting for RouteReady of grafana: an error on the server .... " Expected results: Successful install with KUBECONFIG and login info Additional info: Install logs will be provided in next private comment
Seem like this is some component operator issue in this specific payload image. I tried 4.0.0-0.nightly-2019-02-21-034936 and 4.0.0-0.nightly-2019-02-21-215247, did not hit such issue.
Hit the same problem with registry.svc.ci.openshift.org/ocp/release:4.0.0-0.nightly-2019-02-20-194410
Moving to the monitoring team since it is their component that is reporting this error.
Did not hit the issue with the same payload 4.0.0-0.nightly-2019-02-20-194410
This looks like a transient error, as we are just a consumer of this API. As QE verified this works in latest payloads, I'm suspecting something was fixed in the OpenShift apiserver, or router. Moving to master component, but might be a better fit in "routing".
no such issue with 4.0.0-0.nightly-2019-03-06-074438, grafana pods could be create successfully
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:0758