Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1811198 - /readyz should start reporting failure on shutdown initiation
Summary: /readyz should start reporting failure on shutdown initiation
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.4.0
Assignee: Abu Kashem
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On: 1811169
Blocks: 1811200 1821493 1821494 1821495
TreeView+ depends on / blocked
 
Reported: 2020-03-06 20:09 UTC by Abu Kashem
Modified: 2020-05-04 11:46 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: Release Note
Doc Text:
Fixed /readyz endpoint to return failure immediately when an API server is terminating.
Clone Of: 1811169
: 1821493 (view as bug list)
Environment:
Last Closed: 2020-05-04 11:45:36 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift origin pull 24656 0 None closed [release-4.4] Bug 1811198: /readyz should start returning failure on shutdown initiation 2020-10-02 09:46:49 UTC
Red Hat Product Errata RHBA-2020:0581 0 None None None 2020-05-04 11:46:11 UTC

Description Abu Kashem 2020-03-06 20:09:12 UTC
+++ This bug was initially created as a clone of Bug #1811169 +++

Description of problem:

Currently, /readyz starts reporting failure after ShutdownDelayDuration elapses. The load balancer(s) uses /readyz for health check and are not aware of the shutdown initiation until ShutdownDelayDuration elapses. This does not give the load balancer(s) enough time to detect and react to it.

We expect /readyz to start returning failure as soon as apiserver shutdown is initiated(SIGTERM received). This gives the load balancer a window (defined by ShutdownDelayDuration) to detect that /readyz is red and stop sending traffic to this server.


How reproducible:
Always


upstream PR: https://github.com/kubernetes/kubernetes/pull/88911

Comment 3 Ke Wang 2020-03-16 09:31:54 UTC
Verified with OCP build 4.4.0-0.nightly-2020-03-15-215151, detail see below,

- in one terminal:
  - exec into kube-apiserver pod of master 0
    $ oc rsh -n openshift-kube-apiserver kube-apiserver-keosp1641-wq2sw-master-0
  - execute in pod terminal: 
    sh-4.2# while true; do curl -k https://localhost:6443/readyz; done
    okokokokokok ...

- in other terminal:
$ oc debug node/keosp1641-wq2sw-master-0
Starting pod/keosp1641-wq2sw-master-0-debug ...
To use host binaries, run `chroot /host`
Pod IP: 192.168.0.30
If you don't see a command prompt, try pressing enter.
sh-4.2# chroot /host
sh-4.4# bash
[root@keosp1641-wq2sw-master-0 /]# ps aux | grep " kube-apiserver "
root       55883 13.9  8.3 2782096 1365064 ?     Ssl  07:32  15:27 kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=192.168.0.30 -v=2
root      378139  0.0  0.0   9180  1080 pts/0    S+   09:23   0:00 grep --color=auto  kube-apiserver 

[root@keosp1641-wq2sw-master-0 /]# kill -INT 55883

- in first terminal we can see:

[+]ping ok
[+]log ok
[+]etcd ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/openshift.io-StartOAuthInformers ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-discovery-available ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[-]shutdown failed: reason withheld
healthz check failed


The endpoint of readyz will start returning failure as soon as apiserver shutdown is initiated.

Comment 5 errata-xmlrpc 2020-05-04 11:45:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0581


Note You need to log in before you can comment on or make changes to this bug.