Description of problem: One job https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/47 failed with: level=fatal msg="failed to initialize the cluster: Multiple errors are preventing progress:\n* Cluster operator console is reporting a failure: CustomLogoDegraded: waiting on route host\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" Other 3 kinds of jobs: https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/46 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/45 ... etc. https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-serial-4.2/21 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-serial-4.2/20 ... etc. https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-fips-4.2/21 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-fips-4.2/20 ... etc. These failed with: Installing from release registry.svc.ci.openshift.org/ocp/release:4.2 level=warning msg="Found override for ReleaseImage. Please be warned, this is not advised" level=info msg="Consuming \"Install Config\" from target directory" level=info msg="Creating infrastructure resources..." level=error level=error msg="Error: Error creating service account: googleapi: Error 429: Maximum number of service accounts on project reached., rateLimitExceeded" level=error level=error msg=" on ../tmp/openshift-install-026222549/iam/main.tf line 1, in resource \"google_service_account\" \"worker-node-sa\":" level=error msg=" 1: resource \"google_service_account\" \"worker-node-sa\" {" level=error level=error level=error level=error msg="Error: Error creating service account: googleapi: Error 429: Maximum number of service accounts on project reached., rateLimitExceeded" level=error level=error msg=" on ../tmp/openshift-install-026222549/master/main.tf line 1, in resource \"google_service_account\" \"master-node-sa\":" level=error msg=" 1: resource \"google_service_account\" \"master-node-sa\" {" level=error level=error level=fatal msg="failed to fetch Cluster: failed to generate asset \"Cluster\": failed to create cluster: failed to apply using Terraform" --- Container test exited with code 1, reason Error --- Another process exited
Also found above error in Azure env installation with latest 4.2.0-0.nightly-2019-08-19-071902 : https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/65669/ log: https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/65669/console https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/65669/artifact/workdir/install-dir/auth/kubeconfig level=info msg="Waiting up to 30m0s for the cluster at https://api....qe.azure.devcluster.openshift.com:6443 to initialize..." level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-08-19-071902: 92% complete" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator console is reporting a failure: CustomLogoDegraded: waiting on route host\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-08-19-071902: 92% complete" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator console is reporting a failure: CustomLogoDegraded: waiting on route host\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" level=fatal msg="failed to initialize the cluster: Multiple errors are preventing progress:\n* Cluster operator console is reporting a failure: CustomLogoDegraded: waiting on route host\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers"
Analyzing CI artifacts it looks like cluster-monitoring-operator was never started (there are no logs from cmo pods). This is probably due to the fact that CMO runs on worker nodes which weren't started either (no logs from them too and workers-journal is empty: https://storage.cloud.google.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/47/artifacts/e2e-gcp/nodes/workers-journal) Reassigning to installer team as it seems like installation didn't finish successfully.
> https://00e9e64bac72a6c5a275303d2f50c1d69c774c446078c71291-apidata.googleusercontent.com/download/storage/v1/b/origin-ci-test/o/logs%2Fcanary-openshift-ocp-installer-e2e-gcp-4.2%2F47%2Fartifacts%2Fe2e-gcp%2Fpods%2Fopenshift-cloud-credential-operator_cloud-credential-operator-cc9fd5444-6pgbd_manager.log?qk=AD5uMEuj-ON8tndmm7-17UyXQmeJ-J51t4-2GknMDQqYoKwsCx_kiIYNTYIIu7Lvs0ESj5ncfZVAz33-n9Pp4UxCKBCkAlaKvnUObYpTnl48mN21klZewhXsFPPa6q1liLKmvBvnrjyR4L0PC77H_y47MGGvpJjf0E4Nsl_TEb8_uEv-Nk8DctAAAlURmhQnrFFDp8KzBVO-mf8A33kyv9J3c3nGoKoLgTZnCewMKJ_McOmwdw72MYPDk1QvCTOaQPtxwWsbofdctvQvdr8-jzNfLKw3WQiJF-yLOCeEKdwLlDSmqV9PgZtwHcQ_D9MPNdCsB6bKP3S1gu3rvoxCP09jtxiJj2rlt907vXMM3a0jrBsvHGp0N04gC741YmTSUyMO0ew4cmML4nhVOTKJqi1uH1U7zRvvpRlnJBUL5TrELSBz-HJ84v4vLfJaRApZvo9a_v4xhnTWsuLgqxF-s7z0TIhzddT26Gv0rfSi4Uhl099jtWKI7Bg72HmiQ5_LNJwMIoivRUaMXBZcg8vKddvf2Q2IctytU1klZ4YgAxzUY_cNhi-7ijg4XscOxFrcfx0OIxrY5ky8dDTf3iWrl9E_dQbX3DlxJ9Kk5BD-Gd8-VhBSa2zRGy76Qu8Acj9v4huDyf9-mqbXYABrAuwX2yVpC9JUxZxtXUd-Agcp6ijno1hzQs8m9Gelsj5JIaOvVrQBn6FQPw3Q2xVOxliRiZfPfAzSX9aF_KdzpQUkpUNYNunhbCK3HC1cHW6DqItN1uWaniYuhQDT2aFFrH2HIg573fnpKSPB_JfL-PE8iCc43l_DjaOkz9rfmt5My2jdIL_1rPO-zH9GI6y99GrJisppe4I7xLrQf6v6yOd74Z6tpTC97O1q3e1kk3Tt0-emVraoROGYdMtGeUINlTAcvW96PNyVE1yTSeQh0KHcMngpzv7B02W6LFjSzNqIDPTH6zYzSn_STWxu23TPk0TP2pMIlqwPFjyas-CM3Ays-DCNdqlxQ4AhJQM ``` time="2019-08-18T22:42:04Z" level=error msg="error syncing creds in mint-mode" actuator=gcp cr=openshift-cloud-credential-operator/openshift-machine-api-gcp error="error creating service account: rpc error: code = ResourceExhausted desc = Maximum number of service accounts on project reached." time="2019-08-18T22:42:04Z" level=error msg="error syncing credentials: error syncing creds in mint-mode: error creating service account: rpc error: code = ResourceExhausted desc = Maximum number of service accounts on project reached." controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-gcp secret=openshift-machine-api/gcp-cloud-credentials ``` Most of these are due to service account being reached.
> Most of these are due to service account being reached. * service account limits being reached in the CI project.
(In reply to Xingxing Xia from comment #1) > Also found above error in Azure env installation with latest > 4.2.0-0.nightly-2019-08-19-071902 : PS: launched AWS and Azure envs with same payload 4.2.0-0.nightly-2019-08-20-002921 , AWS env installation succeeds while Azure env encounters the failure of comment 1
The bug was about e2e-gcp failing the canary jobs. https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-gcp-4.2 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/53 suceeded. So this should no longer be a test blocker. Please open a separate issue for azure please.