+++ This bug was initially created as a clone of Bug #1811169 +++ Description of problem: Currently, /readyz starts reporting failure after ShutdownDelayDuration elapses. The load balancer(s) uses /readyz for health check and are not aware of the shutdown initiation until ShutdownDelayDuration elapses. This does not give the load balancer(s) enough time to detect and react to it. We expect /readyz to start returning failure as soon as apiserver shutdown is initiated(SIGTERM received). This gives the load balancer a window (defined by ShutdownDelayDuration) to detect that /readyz is red and stop sending traffic to this server. How reproducible: Always upstream PR: https://github.com/kubernetes/kubernetes/pull/88911
This is to take the upstream patch https://github.com/kubernetes/kubernetes/pull/88911 into openshift apiserver. See: https://github.com/openshift/openshift-apiserver/pull/81
Verified with OCP 4.4.0-0.nightly-2020-03-18-102708 env, checked below. $ oc -n openshift-apiserver get po -o wide # get pod IP apiserver-74d496b787-b5s8v 1/1 Running 7 154m 10.....40 ip-10-...-193.us-east-2.compute.internal ... In one terminal, enter into master $ oc debug no/ip-10-..-..-193.us-east-2.compute.internal Starting pod/ip-10-.-..-193us-east-2computeinternal-debug ... To use host binaries, run `chroot /host` Pod IP: 10...193 If you don't see a command prompt, try pressing enter. sh-4.2# chroot /host sh-4.4# while true; do curl -k --silent --show-error https://10.....40:8443/readyz ; done |& tee /tmp/ke.log okokokokokokokokokokokokokokokokokokokokokokokokokokokokokok In another terminal, $ oc rsh ip-10-...-193us-east-2computeinternal-debug sh-4.2# chroot /host sh-4.4# ps aux | grep "openshift-apiserver start" root 325696 2.1 1.1 567144 196100 ? Ssl 05:26 0:14 openshift-apiserver start --config=/var/run/configmaps/config/config.yaml -v=2 sh-4.4# kill -INT 325696 In the first terminal, check the output, after above kill, can immediately see: curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 10...40:8443 curl: (7) Failed to connect to 10.128.0.40 port 8443: Connection refused curl: (7) Failed to connect to 10.128.0.40 port 8443: Connection refused ... The endpoint of readyz will start returning failure as soon as openshift-apiserver shutdown is initiated, detects that /readyz is red.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0581