Bug 1880941 - Request before readyz succeeds lead to inconsistent behaviour and are hard to diagnose
Summary: Request before readyz succeeds lead to inconsistent behaviour and are hard to...
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.6
Hardware: Unspecified
OS: Unspecified
Target Milestone: ---
: 4.6.0
Assignee: Stefan Schimanski
QA Contact: Ke Wang
Depends On:
TreeView+ depends on / blocked
Reported: 2020-09-21 08:08 UTC by Stefan Schimanski
Modified: 2020-10-27 16:43 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Last Closed: 2020-10-27 16:42:58 UTC
Target Upstream Version:

Attachments (Terms of Use)

System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 356 0 None closed Bug 1880941: kube-apiserver: log non-probe requests before ready 2020-09-22 17:36:25 UTC
Github openshift origin pull 25506 0 None closed Bug 1880941: extended/apiserver: check for NonReadyRequests events for early traffic from LBs 2020-09-24 18:20:19 UTC
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 16:43:22 UTC

Description Stefan Schimanski 2020-09-21 08:08:31 UTC
We had cases where components accessed kube-apiserver via localhost before it declared readiness via /readyz. This led to unexpected behaviour and the suspicion that SCC admission is broken. The issue was very hard to diagnose because it was not clear that those requests were sent early in the apiserver life-cycle. We need a mechanism to make these mistakes easy to understand and to be found early in CI.

Comment 2 Ke Wang 2020-09-25 04:20:00 UTC
Verified with OCP 4.6 4.6.0-0.nightly-2020-09-25-014731,

For PR https://github.com/openshift/kubernetes/pull/356, we can see it works in latest OCP 4.6 payload,

$ oc logs -n openshift-kube-apiserver kube-apiserver-ip-10-0-148-69.us-east-2.compute.internal | grep 'healthz.go:258'
I0925 03:06:54.708757      16 healthz.go:258] informer-sync,poststarthook/start-apiextensions-controllers,poststarthook/crd-informer-synced,poststarthook/bootstrap-controller,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes,poststarthook/apiservice-registration-controller check failed: readyz
I0925 03:06:54.919315      16 healthz.go:258] poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes check failed: readyz
I0925 03:06:55.430903      16 healthz.go:258] poststarthook/rbac/bootstrap-roles check failed: readyz
I0925 03:14:26.316374      18 healthz.go:258] openshift-apiservices-available,informer-sync,poststarthook/start-apiextensions-controllers,poststarthook/crd-informer-synced,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes,poststarthook/apiservice-registration-controller check failed: readyz
I0925 03:14:26.404999      18 healthz.go:258] openshift-apiservices-available,informer-sync,poststarthook/start-apiextensions-controllers,poststarthook/crd-informer-synced,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes check failed: readyz
I0925 03:14:26.498779      18 healthz.go:258] openshift-apiservices-available,poststarthook/rbac/bootstrap-roles,poststarthook/scheduling/bootstrap-system-priority-classes check failed: readyz
I0925 03:14:27.179119      18 healthz.go:258] openshift-apiservices-available,poststarthook/rbac/bootstrap-roles check failed: readyz
I0925 03:14:28.181211      18 healthz.go:258] openshift-apiservices-available check failed: readyz
I0925 03:18:27.214645      18 healthz.go:258] etcd check failed: readyz

For PR https://github.com/openshift/origin/pull/25506, the related test passed, detail see following searched results:
passed: (16.6s) 2020-09-25T03:55:35 "[sig-api-machinery][Feature:APIServer][Late] API LBs follow /readyz of kube-apiserver and don't send request early [Suite:openshift/conformance/parallel]"

Comment 5 errata-xmlrpc 2020-10-27 16:42:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.