Description of problem: Openshift installation fails time="2020-05-13T01:01:21-04:00" level=fatal msg="failed to initialize the cluster: Working towards 4.5.0-0.nightly-2020-05-12-224129: 87% complete, waiting on authentication" After the installation exists with an error the cluster is functional: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.5.0-0.nightly-2020-05-12-224129 True False 49m Cluster version is 4.5.0-0.nightly-2020-05-12-224129 Version-Release number of the following components: 4.5.0-0.nightly-2020-05-12-224129 Openshift on OSP16.1 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Please include the entire output from the last TASK line through the end of output if an error is generated Expected results: Additional info: Please attach logs from ansible-playbook with the -vvv flag Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
moving to authentication operator since that was the failing operator
The authentication operator is actually up and running at the end, and it reports that correctly in both the logs and its clusteroperator resource, but it's probably too late at that time. The operator took way too long to come up though, which suggests other issues in the cluster. I can see many etcd leader changes: ``` 2020-05-13T05:00:31.985381733Z I0513 05:00:31.985343 1 event.go:278] Event(v1.ObjectReference{Kind:"Deployment", Namespace:"openshift-etcd-operator", Name:"etcd-operator", UID:"b3b84ca7-baae-4828-90de-1b75d18d6b5b", APIVersion:"apps/v1", ResourceVersion:"", FieldPath:""}): type: 'Warning' reason: 'EtcdLeaderChangeMetrics' Detected 2.5 leader changes in last 5 minutes on "OpenStack" disk metrics are: etcd-ostest-sqszc-master-1=0.014960000000000005,etcd-ostest-sqszc-master-0=0.015760000000000003,etcd-ostest-sqszc-master-2=0.015466666666666663 ``` I am going to move this to etcd, although the root cause might be elsewhere.
Waiting on reporter's feedback
The Control Plane nodes need fast disks to comply with etcd requirements: https://github.com/openshift/installer/tree/master/docs/user/openstack#disk-requirements
Is it still an issue with current 4.6 branch?
Seems rare in mainline release-promotion jobs, although 4.5, OpenStack, and a few other less-common platforms are getting hit reasonably often: $ w3m -dump -cols 200 'https://search.svc.ci.openshift.org/?search=failed+to+initialize+the+cluster.*waiting+on.*authentication&maxAge=336h&context=1&type=bug%2Bjunit&name=release-openshift-ocp&groupBy=job' | grep 'failures match' release-openshift-ocp-installer-e2e-gcp-rt-4.5 - 8 runs, 63% failed, 20% of failures match release-openshift-ocp-installer-e2e-openstack-4.5 - 89 runs, 63% failed, 9% of failures match release-openshift-ocp-installer-e2e-openstack-4.6 - 60 runs, 77% failed, 11% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.6 - 60 runs, 67% failed, 10% of failures match release-openshift-ocp-installer-e2e-openstack-4.2 - 17 runs, 71% failed, 17% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.3 - 34 runs, 76% failed, 8% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.5 - 88 runs, 51% failed, 7% of failures match release-openshift-ocp-installer-e2e-azure-4.6 - 145 runs, 30% failed, 2% of failures match release-openshift-ocp-installer-e2e-azure-4.3 - 43 runs, 37% failed, 6% of failures match release-openshift-ocp-installer-e2e-gcp-serial-4.5 - 74 runs, 26% failed, 11% of failures match release-openshift-ocp-installer-e2e-ovirt-4.5 - 114 runs, 46% failed, 2% of failures match release-openshift-ocp-installer-e2e-gcp-rt-4.6 - 8 runs, 100% failed, 25% of failures match release-openshift-ocp-installer-e2e-azure-serial-4.4 - 47 runs, 43% failed, 5% of failures match release-openshift-ocp-installer-e2e-aws-ovn-4.4 - 47 runs, 45% failed, 5% of failures match release-openshift-ocp-installer-e2e-vsphere-upi-4.4 - 47 runs, 87% failed, 2% of failures match release-openshift-ocp-installer-e2e-gcp-4.5 - 82 runs, 30% failed, 20% of failures match release-openshift-ocp-installer-e2e-ovirt-4.6 - 93 runs, 85% failed, 1% of failures match release-openshift-ocp-installer-e2e-openstack-4.3 - 34 runs, 94% failed, 3% of failures match release-openshift-ocp-installer-e2e-openstack-serial-4.2 - 17 runs, 35% failed, 17% of failures match
It seems that it's working now. It it will happen again - I'll reopen.