Bug 1989461
| Summary: | kube-apiserver does not use the SO_REUSEPORT properly | ||
|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Michal Fojtik <mfojtik> |
| Component: | kube-apiserver | Assignee: | Michal Fojtik <mfojtik> |
| Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> |
| Severity: | medium | Docs Contact: | |
| Priority: | medium | ||
| Version: | 4.9 | CC: | aos-bugs, mfojtik, xxia |
| Target Milestone: | --- | ||
| Target Release: | 4.9.0 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2021-10-18 17:44:09 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
|
Description
Michal Fojtik
2021-08-03 09:32:54 UTC
Verification steps as below,
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.9.0-0.nightly-2021-08-16-154237 True False 91m Cluster version is 4.9.0-0.nightly-2021-08-16-154237
In terminal A, forced kube-apiserver to do degraded.
$ oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]'
kubeapiserver.operator.openshift.io/cluster patched
In terminal B,
$ cat test.sh
#!/usr/bin/env bash
while true
do
date;curl -ks https://api....:6443/readyz
echo
done
$ bash ./test.sh
...
Tue 17 Aug 2021 06:49:15 PM CST
ok
Tue 17 Aug 2021 06:49:16 PM CST
Tue 17 Aug 2021 06:49:22 PM CST
Tue 17 Aug 2021 06:49:27 PM CST
...
Tue 17 Aug 2021 06:49:54 PM CST
Tue 17 Aug 2021 06:49:59 PM CST
Tue 17 Aug 2021 06:50:05 PM CST
[+]ping ok
[+]log ok
[+]etcd ok
[+]api-openshift-apiserver-available ok
[+]api-openshift-oauth-apiserver-available ok
[-]informer-sync failed: reason withheld
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[-]poststarthook/start-apiextensions-controllers failed: reason withheld
[+]poststarthook/crd-informer-synced ok
[-]poststarthook/bootstrap-controller failed: reason withheld
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld
[-]poststarthook/priority-and-fairness-config-producer failed: reason withheld
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[-]poststarthook/apiservice-registration-controller failed: reason withheld
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]shutdown ok
readyz check failed
...
Tue 17 Aug 2021 06:50:21 PM CST
ok
Run above script, from the test results, we can see
The kube-apiserver is outage from 2021 06:49:16 to 2021 06:50:21, about one minute(60s)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Did the same test as above on OCP 4.8 without this PR fix,
$ oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.8.0-0.nightly-2021-08-17-004424 True False 10m Cluster version is 4.8.0-0.nightly-2021-08-17-004424
Wed 18 Aug 2021 01:25:11 PM CST
ok
Wed 18 Aug 2021 01:25:12 PM CST
[+]ping ok
[+]log ok
[+]etcd ok
[+]api-openshift-apiserver-available ok
[+]api-openshift-oauth-apiserver-available ok
[+]informer-sync ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[-]shutdown failed: reason withheld
readyz check failed
...
Wed 18 Aug 2021 01:28:33 PM CST
[+]ping ok
[+]log ok
[+]etcd ok
[+]api-openshift-apiserver-available ok
[+]api-openshift-oauth-apiserver-available ok
[+]informer-sync ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[-]poststarthook/rbac/bootstrap-roles failed: reason withheld
[-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld
[-]poststarthook/priority-and-fairness-config-producer failed: reason withheld
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[-]poststarthook/apiservice-registration-controller failed: reason withheld
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[+]shutdown ok
readyz check failed
Wed 18 Aug 2021 01:28:42 PM CST
ok
From the above test results, we can see
The kube-apiserver is outage from 2021 01:25:12 to 2021 01:28:42 , about three and a half minutes(210s)
- Tested on non-SNO cluster, get one of kube-apiserver ip,
$ oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]'
Logged in one bastion server and run following script,
# cat test.sh
while true
do
date;curl -ks https://10.0.0.7:6443/readyz
echo
sleep 1
done
# ./test.sh | tee test.log
After the kube-apiserver roll-out is completed.
$ cat test.log
...
Thu Aug 19 06:22:53 EDT 2021
ok
Thu Aug 19 06:22:54 EDT 2021
[+]ping ok
[+]log ok
[+]etcd ok
[+]api-openshift-apiserver-available ok
[+]api-openshift-oauth-apiserver-available ok
[+]informer-sync ok
[+]poststarthook/openshift.io-openshift-apiserver-reachable ok
[+]poststarthook/openshift.io-oauth-apiserver-reachable ok
[+]poststarthook/start-kube-apiserver-admission-initializer ok
[+]poststarthook/quota.openshift.io-clusterquotamapping ok
[+]poststarthook/openshift.io-deprecated-api-requests-filter ok
[+]poststarthook/openshift.io-startkubeinformers ok
[+]poststarthook/generic-apiserver-start-informers ok
[+]poststarthook/priority-and-fairness-config-consumer ok
[+]poststarthook/priority-and-fairness-filter ok
[+]poststarthook/start-apiextensions-informers ok
[+]poststarthook/start-apiextensions-controllers ok
[+]poststarthook/crd-informer-synced ok
[+]poststarthook/bootstrap-controller ok
[+]poststarthook/rbac/bootstrap-roles ok
[+]poststarthook/scheduling/bootstrap-system-priority-classes ok
[+]poststarthook/priority-and-fairness-config-producer ok
[+]poststarthook/start-cluster-authentication-info-controller ok
[+]poststarthook/aggregator-reload-proxy-client-cert ok
[+]poststarthook/start-kube-aggregator-informers ok
[+]poststarthook/apiservice-registration-controller ok
[+]poststarthook/apiservice-status-available-controller ok
[+]poststarthook/apiservice-wait-for-first-sync ok
[+]poststarthook/kube-apiserver-autoregistration ok
[+]autoregister-completion ok
[+]poststarthook/apiservice-openapi-controller ok
[-]shutdown failed: reason withheld
readyz check failed
...
Thu Aug 19 06:25:13 EDT 2021
ok
The kube-apiserver is outage from 06:22:54 to 06:25:13 , total 139s, it's a normal GracefulTerminationDuration, but the PR fix saved a significant time for SNO cluster, all is well, so move the bug VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |