Description of problem: Currently, /readyz starts reporting failure after ShutdownDelayDuration elapses. The load balancer(s) uses /readyz for health check and are not aware of the shutdown initiation until ShutdownDelayDuration elapses. This does not give the load balancer(s) enough time to detect and react to it. We expect /readyz to start returning failure as soon as apiserver shutdown is initiated(SIGTERM received). This gives the load balancer a window (defined by ShutdownDelayDuration) to detect that /readyz is red and stop sending traffic to this server. How reproducible: Always upstream PR: https://github.com/kubernetes/kubernetes/pull/88911
Verified with OCP build 4.5.0-0.nightly-2020-03-15-152626, detail see below, - in one terminal: - exec into kube-apiserver pod of master 0 $ oc rsh -n openshift-kube-apiserver <kube-apiserver pod name> - execute: while true; do curl -k https://localhost:6443/readyz; done sh-4.2# while true; do curl -k https://localhost:6443/readyz; done okokokokokok ... - in other terminal: - oc debug node/<master-0> - chroot /host - bash - ps aux | grep " kube-apiserver " - kill -INT <pid-from-previous-output> - in first terminal we can see: [+]ping ok [+]log ok [+]etcd ok [+]poststarthook/quota.openshift.io-clusterquotamapping ok [+]poststarthook/openshift.io-startkubeinformers ok [+]poststarthook/openshift.io-StartOAuthInformers ok [+]poststarthook/start-kube-apiserver-admission-initializer ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-discovery-available ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/bootstrap-controller ok [+]poststarthook/rbac/bootstrap-roles ok [+]poststarthook/scheduling/bootstrap-system-priority-classes ok [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [+]poststarthook/apiservice-registration-controller ok [+]poststarthook/apiservice-status-available-controller ok [+]poststarthook/apiservice-wait-for-first-sync ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [-]shutdown failed: reason withheld healthz check failed The endpoint of readyz will start returning failure as soon as apiserver shutdown is initiated.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409