Bug 1989461
Summary: | kube-apiserver does not use the SO_REUSEPORT properly | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Michal Fojtik <mfojtik> |
Component: | kube-apiserver | Assignee: | Michal Fojtik <mfojtik> |
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 4.9 | CC: | aos-bugs, mfojtik, xxia |
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:44:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Michal Fojtik
2021-08-03 09:32:54 UTC
Verification steps as below, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-08-16-154237 True False 91m Cluster version is 4.9.0-0.nightly-2021-08-16-154237 In terminal A, forced kube-apiserver to do degraded. $ oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]' kubeapiserver.operator.openshift.io/cluster patched In terminal B, $ cat test.sh #!/usr/bin/env bash while true do date;curl -ks https://api....:6443/readyz echo done $ bash ./test.sh ... Tue 17 Aug 2021 06:49:15 PM CST ok Tue 17 Aug 2021 06:49:16 PM CST Tue 17 Aug 2021 06:49:22 PM CST Tue 17 Aug 2021 06:49:27 PM CST ... Tue 17 Aug 2021 06:49:54 PM CST Tue 17 Aug 2021 06:49:59 PM CST Tue 17 Aug 2021 06:50:05 PM CST [+]ping ok [+]log ok [+]etcd ok [+]api-openshift-apiserver-available ok [+]api-openshift-oauth-apiserver-available ok [-]informer-sync failed: reason withheld [+]poststarthook/openshift.io-startkubeinformers ok [+]poststarthook/openshift.io-openshift-apiserver-reachable ok [+]poststarthook/openshift.io-oauth-apiserver-reachable ok [+]poststarthook/start-kube-apiserver-admission-initializer ok [+]poststarthook/quota.openshift.io-clusterquotamapping ok [+]poststarthook/openshift.io-deprecated-api-requests-filter ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/start-apiextensions-informers ok [-]poststarthook/start-apiextensions-controllers failed: reason withheld [+]poststarthook/crd-informer-synced ok [-]poststarthook/bootstrap-controller failed: reason withheld [-]poststarthook/rbac/bootstrap-roles failed: reason withheld [-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld [-]poststarthook/priority-and-fairness-config-producer failed: reason withheld [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [-]poststarthook/apiservice-registration-controller failed: reason withheld [+]poststarthook/apiservice-status-available-controller ok [+]poststarthook/apiservice-wait-for-first-sync ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [+]shutdown ok readyz check failed ... Tue 17 Aug 2021 06:50:21 PM CST ok Run above script, from the test results, we can see The kube-apiserver is outage from 2021 06:49:16 to 2021 06:50:21, about one minute(60s) ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Did the same test as above on OCP 4.8 without this PR fix, $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-08-17-004424 True False 10m Cluster version is 4.8.0-0.nightly-2021-08-17-004424 Wed 18 Aug 2021 01:25:11 PM CST ok Wed 18 Aug 2021 01:25:12 PM CST [+]ping ok [+]log ok [+]etcd ok [+]api-openshift-apiserver-available ok [+]api-openshift-oauth-apiserver-available ok [+]informer-sync ok [+]poststarthook/quota.openshift.io-clusterquotamapping ok [+]poststarthook/openshift.io-deprecated-api-requests-filter ok [+]poststarthook/openshift.io-startkubeinformers ok [+]poststarthook/openshift.io-openshift-apiserver-reachable ok [+]poststarthook/openshift.io-oauth-apiserver-reachable ok [+]poststarthook/start-kube-apiserver-admission-initializer ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/bootstrap-controller ok [+]poststarthook/rbac/bootstrap-roles ok [+]poststarthook/scheduling/bootstrap-system-priority-classes ok [+]poststarthook/priority-and-fairness-config-producer ok [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [+]poststarthook/apiservice-registration-controller ok [+]poststarthook/apiservice-status-available-controller ok [+]poststarthook/apiservice-wait-for-first-sync ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [-]shutdown failed: reason withheld readyz check failed ... Wed 18 Aug 2021 01:28:33 PM CST [+]ping ok [+]log ok [+]etcd ok [+]api-openshift-apiserver-available ok [+]api-openshift-oauth-apiserver-available ok [+]informer-sync ok [+]poststarthook/start-kube-apiserver-admission-initializer ok [+]poststarthook/quota.openshift.io-clusterquotamapping ok [+]poststarthook/openshift.io-deprecated-api-requests-filter ok [+]poststarthook/openshift.io-startkubeinformers ok [+]poststarthook/openshift.io-openshift-apiserver-reachable ok [+]poststarthook/openshift.io-oauth-apiserver-reachable ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/bootstrap-controller ok [-]poststarthook/rbac/bootstrap-roles failed: reason withheld [-]poststarthook/scheduling/bootstrap-system-priority-classes failed: reason withheld [-]poststarthook/priority-and-fairness-config-producer failed: reason withheld [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [-]poststarthook/apiservice-registration-controller failed: reason withheld [+]poststarthook/apiservice-status-available-controller ok [+]poststarthook/apiservice-wait-for-first-sync ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [+]shutdown ok readyz check failed Wed 18 Aug 2021 01:28:42 PM CST ok From the above test results, we can see The kube-apiserver is outage from 2021 01:25:12 to 2021 01:28:42 , about three and a half minutes(210s) - Tested on non-SNO cluster, get one of kube-apiserver ip, $ oc patch kubeapiserver/cluster --type=json -p '[ {"op": "replace", "path": "/spec/forceRedeploymentReason", "value": "roll-'"$( date --rfc-3339=ns )"'"} ]' Logged in one bastion server and run following script, # cat test.sh while true do date;curl -ks https://10.0.0.7:6443/readyz echo sleep 1 done # ./test.sh | tee test.log After the kube-apiserver roll-out is completed. $ cat test.log ... Thu Aug 19 06:22:53 EDT 2021 ok Thu Aug 19 06:22:54 EDT 2021 [+]ping ok [+]log ok [+]etcd ok [+]api-openshift-apiserver-available ok [+]api-openshift-oauth-apiserver-available ok [+]informer-sync ok [+]poststarthook/openshift.io-openshift-apiserver-reachable ok [+]poststarthook/openshift.io-oauth-apiserver-reachable ok [+]poststarthook/start-kube-apiserver-admission-initializer ok [+]poststarthook/quota.openshift.io-clusterquotamapping ok [+]poststarthook/openshift.io-deprecated-api-requests-filter ok [+]poststarthook/openshift.io-startkubeinformers ok [+]poststarthook/generic-apiserver-start-informers ok [+]poststarthook/priority-and-fairness-config-consumer ok [+]poststarthook/priority-and-fairness-filter ok [+]poststarthook/start-apiextensions-informers ok [+]poststarthook/start-apiextensions-controllers ok [+]poststarthook/crd-informer-synced ok [+]poststarthook/bootstrap-controller ok [+]poststarthook/rbac/bootstrap-roles ok [+]poststarthook/scheduling/bootstrap-system-priority-classes ok [+]poststarthook/priority-and-fairness-config-producer ok [+]poststarthook/start-cluster-authentication-info-controller ok [+]poststarthook/aggregator-reload-proxy-client-cert ok [+]poststarthook/start-kube-aggregator-informers ok [+]poststarthook/apiservice-registration-controller ok [+]poststarthook/apiservice-status-available-controller ok [+]poststarthook/apiservice-wait-for-first-sync ok [+]poststarthook/kube-apiserver-autoregistration ok [+]autoregister-completion ok [+]poststarthook/apiservice-openapi-controller ok [-]shutdown failed: reason withheld readyz check failed ... Thu Aug 19 06:25:13 EDT 2021 ok The kube-apiserver is outage from 06:22:54 to 06:25:13 , total 139s, it's a normal GracefulTerminationDuration, but the PR fix saved a significant time for SNO cluster, all is well, so move the bug VERIFIED. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |