Bug 1821500
| Summary: | /readyz should start reporting failure on shutdown initiation | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Abu Kashem <akashem> | |
| Component: | openshift-apiserver | Assignee: | Abu Kashem <akashem> | |
| Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> | |
| Severity: | medium | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.4 | CC: | aos-bugs, kewang, mfojtik | |
| Target Milestone: | --- | |||
| Target Release: | 4.3.z | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1821502 (view as bug list) | Environment: | ||
| Last Closed: | 2020-06-03 03:30:41 UTC | Type: | --- | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1811202 | |||
| Bug Blocks: | 1821502, 1821503 | |||
|
Description
Abu Kashem
2020-04-06 23:56:50 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale". If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Not sure why the "Target Release" of the BZ has been reset. Looking at the history, it's set for "4.3.z". /readyz fix did not make into openshift-apiserver 4.3, looks like openshift/kubernetes-apiserver:openshift-apiserver-4.3-kubernetes-1.17.3 does not have the upstream fix for /readyz. The /readyz fix made it into 1.17.4 in upstream. Stefan created a new branch "1.17.4" on April 15, 2020 - https://github.com/openshift/kubernetes-apiserver/tree/openshift-apiserver-4.3-kubernetes-1.17.4 We need to work on a new PR to move openshift-apiserver 4.3 to use "1.17.4". Verified with OCP 4.3.0-0.nightly-2020-05-25-153254 env, checked below.
$ oc -n openshift-apiserver get po -o wide | grep apiserver | head -1 | awk '{print $6}' # get pod IP
In one terminal, enter into master
$ master=$(oc get node | grep master | awk '{print $1}' | head -1)
$ oc debug node/$master
After logged in the master debug pod,
sh-4.2# chroot /host
sh-4.4# while true; do curl -k --silent --show-error https://<pod IP>:8443/readyz ; done
okokokokokokokokokokokokokokokokokokokokokokokokokokokokokok
In another terminal,
$ oc rsh pod/ip-...-31ap-south-1computeinternal-debug
sh-4.2# chroot /host
sh-4.4# ps aux | grep "openshift-apiserver start"
root 30545 2.1 1.1 567144 196100 ? Ssl 04:13 0:35 openshift-apiserver start --config=/var/run/configmaps/config/config.yaml -v=2
sh-4.4# kill -INT 30545
In the first terminal, check the output, after above kill, can immediately see:
curl: (35) OpenSSL SSL_connect: SSL_ERROR_SYSCALL in connection to 10...40:8443
curl: (7) Failed to connect to 10.129.0.38 port 8443: Connection refused
[+]ping ok
[+]log ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/image.openshift.io-apiserver-caches ok
[-]poststarthook/authorization.openshift.io-bootstrapclusterroles failed: reason withheld
[-]poststarthook/authorization.openshift.io-ensureopenshift-infra failed: reason withheld
[+]poststarthook/project.openshift.io-projectcache ok
[+]poststarthook/project.openshift.io-projectauthorizationcache ok
[-]poststarthook/security.openshift.io-bootstrapscc failed: reason withheld
[+]poststarthook/openshift.io-startinformers ok
[+]poststarthook/openshift.io-restmapperupdater ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]shutdown ok
healthz check failed
...
The endpoint of readyz will start returning failure as soon as openshift-apiserver shutdown is initiated, detects that /readyz is red.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2256 |