Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1779796

Summary: kube-apiserver Progressing=True: 1 nodes are at revision 4; 2 nodes are at revision 6
Product: OpenShift Container Platform Reporter: W. Trevor King <wking>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 4.3.0CC: aos-bugs, fiezzi, mfojtik, nagrawal, sttts, xxia
Target Milestone: ---   
Target Release: 4.4.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1781678 (view as bug list) Environment:
Last Closed: 2020-05-13 21:54:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1781678    

Description W. Trevor King 2019-12-04 18:26:53 UTC
Release promotion informer [1]:

level=info msg="Cluster operator authentication Progressing is True with ProgressingWellKnownNotReady: Progressing: got '404 Not Found' status while trying to GET the OAuth well-known https://10.0.0.19:6443/.well-known/oauth-authorization-server endpoint data"
level=info msg="Cluster operator authentication Available is False with Available: "
level=info msg="Cluster operator insights Disabled is False with : "
level=info msg="Cluster operator kube-apiserver Progressing is True with Progressing: Progressing: 1 nodes are at revision 4; 2 nodes are at revision 6"
level=fatal msg="failed to initialize the cluster: Working towards 4.3.0-0.nightly-2019-12-04-004448: 100% complete"

Similar errors have been reported in bug 1768252 and bug 1776402.  But etcd vs. disk latency is implicated in those bugs, and Sam checked this job and saw no etcd latency issues.  Happened 13 times in the past 24h [2].  Seems well distributed among job names [3].

[1]: https://prow.svc.ci.openshift.org/view/gcs/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.3/514
[2]: https://search.svc.ci.openshift.org/chart?search=Cluster%20operator%20kube-apiserver%20Progressing%20is%20True.*nodes%20are%20at%20revision
[3]: https://search.svc.ci.openshift.org/?search=Cluster%20operator%20kube-apiserver%20Progressing%20is%20True.*nodes%20are%20at%20revision

Comment 1 Standa Laznicka 2019-12-10 08:46:23 UTC
From looking plainly at the attached openstack test-run, I can see that it took quite a long time for the KAS pods to come up with the revision that would actually be capable of serving the oauth-metadata endpoint, moving to KAS component.

Also, there's a nil deref panic in KAS-o in the logs in the test run above - https://storage.googleapis.com/origin-ci-test/logs/release-openshift-ocp-installer-e2e-openstack-serial-4.3/514/artifacts/e2e-openstack-serial/pods/openshift-kube-apiserver-operator_kube-apiserver-operator-55b8787655-zbtbd_kube-apiserver-operator.log

Comment 7 Ke Wang 2020-01-19 08:41:05 UTC
Client Version: v4.3.0
Server Version: 4.3.0-rc.2
Kubernetes Version: v1.16.2

$ master=$(oc get node | grep master | awk '{print $1}' | head -1)
$ oc debug node/$master

After logged in the master debug pod, 
- check if the field "bindNetwork":"tcp4" have not been changed, found them as below,
# grep -rnw /etc/kubernetes -e '"bindNetwork":"tcp4"' | awk -F: '{print $1}'
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-3/configmaps/config/config.yaml
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-4/configmaps/config/config.yaml
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-6/configmaps/config/config.yaml
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/configmaps/config/config.yaml
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-8/configmaps/config/config.yaml
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-9/configmaps/config/config.yaml
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-10/configmaps/config/config.yaml
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-11/configmaps/config/config.yaml

- check if the field "bindNetwork":"tcp" have been changed, found them as below,
# grep -rnw /etc/kubernetes -e '"bindNetwork":"tcp"' | awk -F: '{print $1}'
/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-4/configmaps/cluster-policy-controller-config/config.yaml
/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-6/configmaps/cluster-policy-controller-config/config.yaml
/etc/kubernetes/static-pod-resources/kube-controller-manager-pod-7/configmaps/cluster-policy-controller-config/config.yaml

So I think the fix is not complete for bug.

Comment 8 Ke Wang 2020-01-19 08:43:43 UTC
Please ignore the previous comments, pasted wrong bug.

Comment 11 errata-xmlrpc 2020-05-13 21:54:01 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581