Bug 1961554
Summary: | respect the shutdown-delay-duration from OpenShiftAPIServerConfig | |||
---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Lukasz Szaszkiewicz <lszaszki> | |
Component: | openshift-apiserver | Assignee: | Lukasz Szaszkiewicz <lszaszki> | |
Status: | CLOSED ERRATA | QA Contact: | Xingxing Xia <xxia> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | 4.8 | CC: | aos-bugs, mfojtik | |
Target Milestone: | --- | |||
Target Release: | 4.8.0 | |||
Hardware: | Unspecified | |||
OS: | Unspecified | |||
Whiteboard: | ||||
Fixed In Version: | Doc Type: | If docs needed, set a value | ||
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1961557 (view as bug list) | Environment: | ||
Last Closed: | 2021-07-27 23:08:55 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1961557 |
Description
Lukasz Szaszkiewicz
2021-05-18 09:08:50 UTC
I think the best way to test it is to change the value manually to some high number and then trigger termination. The server should shut down after the specified time. Reading the code, tested in 4.8.0-0.nightly-2021-05-21-233425 env: $ oc edit openshiftapiserver cluster # add below ... unsupportedConfigOverrides: apiServerArguments: shutdown-delay-duration: - 120s 120s is indeed used per the PR code: $ oc get cm config -n openshift-apiserver -o yaml Found: config.yaml: '{"apiServerArguments":{..."shutdown-delay-duration":["120s"]... Waiting for new pods become all Running and ready. Then delete pod apiserver-66889485b9-vx65k at 19:54:36: $ oc delete po -n openshift-apiserver apiserver-66889485b9-vx65k The command hangs until 19:55:51 exited. The gap is less than 120s. <---- observation 1 Meanwhile in another terminal, repeatedly check `oc get po -n openshift-apiserver`: $ oc get po -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-66889485b9-9jf7l 0/2 Pending 0 7s apiserver-66889485b9-n4qm6 2/2 Running 0 2m19s apiserver-66889485b9-vx65k 2/2 Terminating 0 8m34s apiserver-66889485b9-x7x22 2/2 Running 0 5m38s $ oc get po -n openshift-apiserver apiserver-66889485b9-9jf7l 0/2 Pending 0 78s apiserver-66889485b9-n4qm6 2/2 Running 0 3m30s apiserver-66889485b9-vx65k 0/2 Terminating 0 9m45s apiserver-66889485b9-x7x22 2/2 Running 0 6m49s $ oc get po -n openshift-apiserver NAME READY STATUS RESTARTS AGE apiserver-66889485b9-9jf7l 0/2 Init:0/1 0 79s apiserver-66889485b9-n4qm6 2/2 Running 0 3m31s apiserver-66889485b9-x7x22 2/2 Running 0 6m50s After about 78s instead of 120s, the pod is already gone. Shouldn't it wait after 120s? <---- observation 2 Meanwhile on the master of the pod (pod IP is 10.129.0.61), ran: # while true; do echo "$(date '+%Y-%m-%dT%H:%M:%S%:z'): `curl -ksS https://10.129.0.61:8443/healthz`"; sleep 2; done At 19:55:47, it can't return "ok". The gap is also less than 120s since above deletion 19:54:36 <---- observation 3 I tried another pod deletion, same result. The pod was terminated by kubelet. There is terminationGracePeriodSeconds set to 70 seconds (https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/bindata/v3.11.0/openshift-apiserver/deploy.yaml#L158) This value is a hard limit after which the pod will be forcefully terminated by kublet. You would have to increase this value as well. Seeing that the pod was terminated after ~78 is a good sign. The default value for shutdown-delay-duration is 10 seconds (https://github.com/openshift/cluster-openshift-apiserver-operator/blob/master/bindata/v3.11.0/config/defaultconfig.yaml#L18) Good catch! I should have dug that. Lukasz, Double checked in 4.8 with shutdown-delay-duration 40s, the pod ALSO keeps Terminating for about 70+ s instead of 40s, so kubelet terminationGracePeriodSeconds takes precedence over shutdown-delay-duration, no matter which is smaller? Anyway BTW I double checked 4.7 which does not yet have the backport, setting shutdown-delay-duration 40s but the pod is gone after keeping Terminating for only 10s. So this means above 4.8 result is good verification sign. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.8.2 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2438 The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days |