Bug 1811200

Summary: /readyz should start reporting failure on shutdown initiation
Product: OpenShift Container Platform Reporter: Abu Kashem <akashem>
Component: kube-apiserverAssignee: Abu Kashem <akashem>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, mfojtik, xxia
Target Milestone: ---   
Target Release: 4.3.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1811169 Environment:
Last Closed: 2020-04-01 19:10:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1811198, 1821493, 1821494, 1821495    
Bug Blocks:    

Description Abu Kashem 2020-03-06 20:13:42 UTC
+++ This bug was initially created as a clone of Bug #1811169 +++

Description of problem:

Currently, /readyz starts reporting failure after ShutdownDelayDuration elapses. The load balancer(s) uses /readyz for health check and are not aware of the shutdown initiation until ShutdownDelayDuration elapses. This does not give the load balancer(s) enough time to detect and react to it.

We expect /readyz to start returning failure as soon as apiserver shutdown is initiated(SIGTERM received). This gives the load balancer a window (defined by ShutdownDelayDuration) to detect that /readyz is red and stop sending traffic to this server.


How reproducible:
Always


upstream PR: https://github.com/kubernetes/kubernetes/pull/88911

Comment 8 Ke Wang 2020-03-24 08:39:26 UTC
Verified with OCP build 4.3.0-0.nightly-2020-03-23-130439, detail see below,

- In one terminal, enter into master 
$ master=$(oc get nodes | grep master | head -1 | cut -d " " -f1)
$ oc debug node/$master
sh-4.2# chroot /host
sh-4.4# while true; do curl -k --silent --show-error https://localhost:6443/readyz ; done

- In another terminal,
$ debug_pod=$(oc get pod -A | grep debug | awk '{print $2}'
$ oc rsh pod/$debug_pod
sh-4.2# chroot /host
sh-4.4# kill -INT `ps aux | grep " kube-apiserver " | grep -v grep | awk '{print $2}'`

- In the first terminal, check the output, after above kill, can immediately see:
[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-discovery-available ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/openshift.io-clientCA-reload ok
[+]poststarthook/openshift.io-requestheader-reload ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-kubernetes-informers-synched ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[-]shutdown failed: reason withheld
healthz check failed

The endpoint of readyz will start returning failure as soon as apiserver shutdown is initiated.

Comment 10 errata-xmlrpc 2020-04-01 19:10:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0930