Bug 1835100
| Summary: | Openshift installation fails - Waiting on authentication | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Itzik Brown <itbrown> |
| Component: | Installer | Assignee: | Pierre Prinetti <pprinett> |
| Installer sub component: | OpenShift on OpenStack | QA Contact: | David Sanz <dsanzmor> |
| Status: | CLOSED CURRENTRELEASE | Docs Contact: | |
| Severity: | medium | ||
| Priority: | medium | CC: | aos-bugs, bleanhar, ltomasbo, m.andre, mfojtik, pprinett, slaznick, smaitra, wking |
| Version: | 4.5 | Keywords: | TestBlockerForLayeredProduct, UpcomingSprint |
| Target Milestone: | --- | ||
| Target Release: | 4.6.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | No Doc Update | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-07-21 03:57:18 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Itzik Brown
2020-05-13 05:59:19 UTC
moving to authentication operator since that was the failing operator The authentication operator is actually up and running at the end, and it reports that correctly in both the logs and its clusteroperator resource, but it's probably too late at that time. The operator took way too long to come up though, which suggests other issues in the cluster.
I can see many etcd leader changes:
```
2020-05-13T05:00:31.985381733Z I0513 05:00:31.985343 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-etcd-operator", Name:"etcd-operator", UID:"b3b84ca7-baae-4828-90de-1b75d18d6b5b", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EtcdLeaderChangeMetrics' Detected 2.5 leader changes in last 5 minutes on "OpenStack" disk metrics are: etcd-ostest-sqszc-master-1=0.014960000000000005,etcd-ostest-sqszc-master-0=0.015760000000000003,etcd-ostest-sqszc-master-2=0.015466666666666663
```
I am going to move this to etcd, although the root cause might be elsewhere.
Waiting on reporter's feedback The Control Plane nodes need fast disks to comply with etcd requirements: https://github.com/openshift/installer/tree/master/docs/user/openstack#disk-requirements Is it still an issue with current 4.6 branch? Seems rare in mainline release-promotion jobs, although 4.5, OpenStack, and a few other less-common platforms are getting hit reasonably often: $ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?search=failed+to+initialize+the+cluster.*waiting+on.*authentication&maxAge=336h&context=1&type=bug%2Bjunit&name=release-openshift-ocp&groupBy=job' | grep 'failures match' release-openshift-ocp-installer-e2e-gcp-rt-4.5 - 8 runs, 63% failed, 20% of failures match release-openshift-ocp-installer-e2e-openstack-4.5 - 89 runs, 63% failed, 9% of failures match release-openshift-ocp-installer-e2e-openstack-4.6 - 60 runs, 77% failed, 11% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.6 - 60 runs, 67% failed, 10% of failures match release-openshift-ocp-installer-e2e-openstack-4.2 - 17 runs, 71% failed, 17% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.3 - 34 runs, 76% failed, 8% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.5 - 88 runs, 51% failed, 7% of failures match release-openshift-ocp-installer-e2e-azure-4.6 - 145 runs, 30% failed, 2% of failures match release-openshift-ocp-installer-e2e-azure-4.3 - 43 runs, 37% failed, 6% of failures match release-openshift-ocp-installer-e2e-gcp-serial-4.5 - 74 runs, 26% failed, 11% of failures match release-openshift-ocp-installer-e2e-ovirt-4.5 - 114 runs, 46% failed, 2% of failures match release-openshift-ocp-installer-e2e-gcp-rt-4.6 - 8 runs, 100% failed, 25% of failures match release-openshift-ocp-installer-e2e-azure-serial-4.4 - 47 runs, 43% failed, 5% of failures match release-openshift-ocp-installer-e2e-aws-ovn-4.4 - 47 runs, 45% failed, 5% of failures match release-openshift-ocp-installer-e2e-vsphere-upi-4.4 - 47 runs, 87% failed, 2% of failures match release-openshift-ocp-installer-e2e-gcp-4.5 - 82 runs, 30% failed, 20% of failures match release-openshift-ocp-installer-e2e-ovirt-4.6 - 93 runs, 85% failed, 1% of failures match release-openshift-ocp-installer-e2e-openstack-4.3 - 34 runs, 94% failed, 3% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.2 - 17 runs, 35% failed, 17% of failures match It seems that it's working now. It it will happen again - I'll reopen. |