Release promotion informer [1]: level=info msg="Cluster operator authentication Progressing is True with ProgressingWellKnownNotReady: Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.19:6443/.well-known/oauth-authorization-server endpoint data" level=info msg="Cluster operator authentication Available is False with Available: " level=info msg="Cluster operator insights Disabled is False with : " level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 4; 2 nodes are at revision 6" level=fatal msg="failed to initialize the cluster: Working towards 4.3.0-0.nightly-2019-12-04-004448: 100% complete" Similar errors have been reported in bug 1768252 and bug 1776402. But etcd vs. disk latency is implicated in those bugs, and Sam checked this job and saw no etcd latency issues. Happened 13 times in the past 24h [2]. Seems well distributed among job names [3]. [1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.3/514 [2]: https://search.svc.ci.openshift.org/chart?search=Cluster%20operator%20kube-apiserver%20Progressing%20is%20True.*nodes%20are%20at%20revision [3]: https://search.svc.ci.openshift.org/?search=Cluster%20operator%20kube-apiserver%20Progressing%20is%20True.*nodes%20are%20at%20revision
From looking plainly at the attached openstack test-run, I can see that it took quite a long time for the KAS pods to come up with the revision that would actually be capable of serving the oauth-metadata endpoint, moving to KAS component. Also, there's a nil deref panic in KAS-o in the logs in the test run above - https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.3/514/artifacts/e2e-openstack-serial/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-55b8787655-zbtbd_kube-apiserver-operator.log
Client Version: v4.3.0 Server Version: 4.3.0-rc.2 Kubernetes Version: v1.16.2 $ master=$(oc get node | grep master | awk '{print $1}' | head -1) $ oc debug node/$master After logged in the master debug pod, - check if the field "bindNetwork":"tcp4" have not been changed, found them as below, # grep -rnw /etc/kubernetes -e '"bindNetwork":"tcp4"' | awk -F: '{print $1}' /etc/kubernetes/static-pod-resources/kube-apiserver-pod-3/configmaps/config/config.yaml /etc/kubernetes/static-pod-resources/kube-apiserver-pod-4/configmaps/config/config.yaml /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6/configmaps/config/config.yaml /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/configmaps/config/config.yaml /etc/kubernetes/static-pod-resources/kube-apiserver-pod-8/configmaps/config/config.yaml /etc/kubernetes/static-pod-resources/kube-apiserver-pod-9/configmaps/config/config.yaml /etc/kubernetes/static-pod-resources/kube-apiserver-pod-10/configmaps/config/config.yaml /etc/kubernetes/static-pod-resources/kube-apiserver-pod-11/configmaps/config/config.yaml - check if the field "bindNetwork":"tcp" have been changed, found them as below, # grep -rnw /etc/kubernetes -e '"bindNetwork":"tcp"' | awk -F: '{print $1}' /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-4/configmaps/cluster-policy-controller-config/config.yaml /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-6/configmaps/cluster-policy-controller-config/config.yaml /etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7/configmaps/cluster-policy-controller-config/config.yaml So I think the fix is not complete for bug.
Please ignore the previous comments, pasted wrong bug.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581