Bug 1743114
Summary: | Several kinds of failures: Run template e2e-gcp - e2e-gcp container setup | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Xingxing Xia <xxia> |
Component: | Installer | Assignee: | Abhinav Dahiya <adahiya> |
Installer sub component: | openshift-installer | QA Contact: | Johnny Liu <jialiu> |
Status: | CLOSED WORKSFORME | Docs Contact: | |
Severity: | urgent | ||
Priority: | urgent | CC: | alegrand, anpicker, aos-bugs, erooth, jokerman, mloibl, pkrupa, surbania |
Version: | 4.2.0 | ||
Target Milestone: | --- | ||
Target Release: | 4.2.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-08-20 03:37:43 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 1736168 |
Description
Xingxing Xia
2019-08-19 06:41:15 UTC
Also found above error in Azure env installation with latest 4.2.0-0.nightly-2019-08-19-071902 : https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/65669/ log: https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/65669/console https://openshift-qe-jenkins.rhev-ci-vms.eng.rdu2.redhat.com/job/Launch%20Environment%20Flexy/65669/artifact/workdir/install-dir/auth/kubeconfig level=info msg="Waiting up to 30m0s for the cluster at https://api....qe.azure.devcluster.openshift.com:6443 to initialize..." level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-08-19-071902: 92% complete" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator console is reporting a failure: CustomLogoDegraded: waiting on route host\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" level=debug msg="Still waiting for the cluster to initialize: Working towards 4.2.0-0.nightly-2019-08-19-071902: 92% complete" level=debug msg="Still waiting for the cluster to initialize: Multiple errors are preventing progress:\n* Cluster operator console is reporting a failure: CustomLogoDegraded: waiting on route host\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" level=fatal msg="failed to initialize the cluster: Multiple errors are preventing progress:\n* Cluster operator console is reporting a failure: CustomLogoDegraded: waiting on route host\n* Could not update servicemonitor \"openshift-apiserver-operator/openshift-apiserver-operator\" (405 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-authentication-operator/authentication-operator\" (370 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-cluster-version/cluster-version-operator\" (8 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-controller-manager-operator/openshift-controller-manager-operator\" (409 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-image-registry/image-registry\" (376 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-apiserver-operator/kube-apiserver-operator\" (386 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-controller-manager-operator/kube-controller-manager-operator\" (390 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-kube-scheduler-operator/kube-scheduler-operator\" (394 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/cluster-autoscaler-operator\" (148 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-machine-api/machine-api-operator\" (396 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-operator-lifecycle-manager/olm-operator\" (399 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-apiserver-operator/openshift-service-catalog-apiserver-operator\" (379 of 410): the server does not recognize this resource, check extension API servers\n* Could not update servicemonitor \"openshift-service-catalog-controller-manager-operator/openshift-service-catalog-controller-manager-operator\" (382 of 410): the server does not recognize this resource, check extension API servers" Analyzing CI artifacts it looks like cluster-monitoring-operator was never started (there are no logs from cmo pods). This is probably due to the fact that CMO runs on worker nodes which weren't started either (no logs from them too and workers-journal is empty: https://storage.cloud.google.com/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/47/artifacts/e2e-gcp/nodes/workers-journal) Reassigning to installer team as it seems like installation didn't finish successfully. > https://00e9e64bac72a6c5a275303d2f50c1d69c774c446078c71291-apidata.googleusercontent.com/download/storage/v1/b/origin-ci-test/o/logs%2Fcanary-openshift-ocp-installer-e2e-gcp-4.2%2F47%2Fartifacts%2Fe2e-gcp%2Fpods%2Fopenshift-cloud-credential-operator_cloud-credential-operator-cc9fd5444-6pgbd_manager.log?qk=AD5uMEuj-ON8tndmm7-17UyXQmeJ-J51t4-2GknMDQqYoKwsCx_kiIYNTYIIu7Lvs0ESj5ncfZVAz33-n9Pp4UxCKBCkAlaKvnUObYpTnl48mN21klZewhXsFPPa6q1liLKmvBvnrjyR4L0PC77H_y47MGGvpJjf0E4Nsl_TEb8_uEv-Nk8DctAAAlURmhQnrFFDp8KzBVO-mf8A33kyv9J3c3nGoKoLgTZnCewMKJ_McOmwdw72MYPDk1QvCTOaQPtxwWsbofdctvQvdr8-jzNfLKw3WQiJF-yLOCeEKdwLlDSmqV9PgZtwHcQ_D9MPNdCsB6bKP3S1gu3rvoxCP09jtxiJj2rlt907vXMM3a0jrBsvHGp0N04gC741YmTSUyMO0ew4cmML4nhVOTKJqi1uH1U7zRvvpRlnJBUL5TrELSBz-HJ84v4vLfJaRApZvo9a_v4xhnTWsuLgqxF-s7z0TIhzddT26Gv0rfSi4Uhl099jtWKI7Bg72HmiQ5_LNJwMIoivRUaMXBZcg8vKddvf2Q2IctytU1klZ4YgAxzUY_cNhi-7ijg4XscOxFrcfx0OIxrY5ky8dDTf3iWrl9E_dQbX3DlxJ9Kk5BD-Gd8-VhBSa2zRGy76Qu8Acj9v4huDyf9-mqbXYABrAuwX2yVpC9JUxZxtXUd-Agcp6ijno1hzQs8m9Gelsj5JIaOvVrQBn6FQPw3Q2xVOxliRiZfPfAzSX9aF_KdzpQUkpUNYNunhbCK3HC1cHW6DqItN1uWaniYuhQDT2aFFrH2HIg573fnpKSPB_JfL-PE8iCc43l_DjaOkz9rfmt5My2jdIL_1rPO-zH9GI6y99GrJisppe4I7xLrQf6v6yOd74Z6tpTC97O1q3e1kk3Tt0-emVraoROGYdMtGeUINlTAcvW96PNyVE1yTSeQh0KHcMngpzv7B02W6LFjSzNqIDPTH6zYzSn_STWxu23TPk0TP2pMIlqwPFjyas-CM3Ays-DCNdqlxQ4AhJQM
```
time="2019-08-18T22:42:04Z" level=error msg="error syncing creds in mint-mode" actuator=gcp cr=openshift-cloud-credential-operator/openshift-machine-api-gcp error="error creating service account: rpc error: code = ResourceExhausted desc = Maximum number of service accounts on project reached."
time="2019-08-18T22:42:04Z" level=error msg="error syncing credentials: error syncing creds in mint-mode: error creating service account: rpc error: code = ResourceExhausted desc = Maximum number of service accounts on project reached." controller=credreq cr=openshift-cloud-credential-operator/openshift-machine-api-gcp secret=openshift-machine-api/gcp-cloud-credentials
```
Most of these are due to service account being reached.
> Most of these are due to service account being reached.
* service account limits being reached in the CI project.
(In reply to Xingxing Xia from comment #1) > Also found above error in Azure env installation with latest > 4.2.0-0.nightly-2019-08-19-071902 : PS: launched AWS and Azure envs with same payload 4.2.0-0.nightly-2019-08-20-002921 , AWS env installation succeeds while Azure env encounters the failure of comment 1 The bug was about e2e-gcp failing the canary jobs. https://testgrid.k8s.io/redhat-openshift-release-informing#redhat-canary-openshift-ocp-installer-e2e-gcp-4.2 https://prow.k8s.io/view/gcs/origin-ci-test/logs/canary-openshift-ocp-installer-e2e-gcp-4.2/53 suceeded. So this should no longer be a test blocker. Please open a separate issue for azure please. |