Bug 1880941

Summary: Request before readyz succeeds lead to inconsistent behaviour and are hard to diagnose
Product: OpenShift Container Platform Reporter: Stefan Schimanski <sttts>
Component: kube-apiserverAssignee: Stefan Schimanski <sttts>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.6CC: aos-bugs, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:42:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Stefan Schimanski 2020-09-21 08:08:31 UTC
We had cases where components accessed kube-apiserver via localhost before it declared readiness via /readyz. This led to unexpected behaviour and the suspicion that SCC admission is broken. The issue was very hard to diagnose because it was not clear that those requests were sent early in the apiserver life-cycle. We need a mechanism to make these mistakes easy to understand and to be found early in CI.

Comment 2 Ke Wang 2020-09-25 04:20:00 UTC
Verified with OCP 4.6 4.6.0-0.nightly-2020-09-25-014731,

For PR https://github.com/openshift/kubernetes/pull/356, we can see it works in latest OCP 4.6 payload,

$ oc logs -n openshift-kube-apiserver kube-apiserver-ip-10-0-148-69.us-east-2.compute.internal | grep 'healthz.go:258'
I0925 03:06:54.708757      16 healthz.go:258] informer-sync,poststarthook/start-apiextensions-controllers,poststarthook/crd-informer-synced,poststarthook/bootstrap-controller,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes,poststarthook/apiservice-registration-controller check failed: readyz
I0925 03:06:54.919315      16 healthz.go:258] poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes check failed: readyz
I0925 03:06:55.430903      16 healthz.go:258] poststarthook/rbac/bootstrap-roles check failed: readyz
I0925 03:14:26.316374      18 healthz.go:258] openshift-apiservices-available,informer-sync,poststarthook/start-apiextensions-controllers,poststarthook/crd-informer-synced,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes,poststarthook/apiservice-registration-controller check failed: readyz
I0925 03:14:26.404999      18 healthz.go:258] openshift-apiservices-available,informer-sync,poststarthook/start-apiextensions-controllers,poststarthook/crd-informer-synced,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes check failed: readyz
I0925 03:14:26.498779      18 healthz.go:258] openshift-apiservices-available,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes check failed: readyz
I0925 03:14:27.179119      18 healthz.go:258] openshift-apiservices-available,poststarthook/rbac/bootstrap-roles check failed: readyz
I0925 03:14:28.181211      18 healthz.go:258] openshift-apiservices-available check failed: readyz
I0925 03:18:27.214645      18 healthz.go:258] etcd check failed: readyz

For PR https://github.com/openshift/origin/pull/25506, the related test passed, detail see following searched results:
passed: (16.6s) 2020-09-25T03:55:35 "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and don't send request early [Suite:openshift/conformance/parallel]"

Comment 5 errata-xmlrpc 2020-10-27 16:42:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.