Bug 1775224
Summary: | kube-apiserver-operator doesn't release lock when being gracefully terminated | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Tomáš Nožička <tnozicka> |
Component: | kube-apiserver | Assignee: | Michal Fojtik <mfojtik> |
Status: | CLOSED ERRATA | QA Contact: | Ke Wang <kewang> |
Severity: | low | Docs Contact: | |
Priority: | medium | ||
Version: | 4.3.0 | CC: | aos-bugs, mfojtik, sttts, vareti, xxia |
Target Milestone: | --- | ||
Target Release: | 4.5.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause: The leader election setup in operators was not using the "ReleaseOnCancel" option which releases the lock when the operator receive an UNIX signal to shutdown.
Consequence: When rolling new version of operators, it might took minute or two until the lock is released and the new version of operator can continue.
Fix: The graceful shutdown was refactored for control plane operators to respect the graceful termination period and the operators are not guaranteed to shutdown in clean way. This allowed us to enable the "ReleaseOnCancel" option.
Result: The operators now don't want for the lock to be released on startup and the operator rollout time improved significantly.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-07-13 17:12:14 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Tomáš Nožička
2019-11-21 15:26:34 UTC
This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing severity from "medium" to "low". If you have further information on the current state of the bug, please update it, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. This has been fixed in factory. Verified with OCP 4.5.0-0.nightly-2020-05-13-221558, $ oc delete -n openshift-kube-apiserver-operator pod kube-apiserver-operator-745f6658c8-jn4d5 --force --grace-period=0 $ oc -n openshift-kube-apiserver-operator get pods NAME READY STATUS RESTARTS AGE kube-apiserver-operator-745f6658c8-c9ggm 1/1 Running 0 12m $ oc logs -n openshift-kube-apiserver-operator kube-apiserver-operator-745f6658c8-c9ggm | grep -n -A2 'attempting to acquire leader' 11:I0514 07:11:05.508187 1 leaderelection.go:242] attempting to acquire leader lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock... 12-I0514 07:11:05.517100 1 leaderelection.go:252] successfully acquired lease openshift-kube-apiserver-operator/kube-apiserver-operator-lock 13-I0514 07:11:05.517355 1 event.go:278] Event(v1.ObjectReference{Kind:"ConfigMap", Namespace:"openshift-kube-apiserver-operator", Name:"kube-apiserver-operator-lock", UID:"048bbd57-0c2f-4b87-b86b-233a0b1c7ff5", APIVersion:"v1", ResourceVersion:"100923", FieldPath:""}): type: 'Normal' reason: 'LeaderElection' 04c1434e-949c-4963-99ed-44a4ef6bfd40 became leader We can see the new rolled out openshift-kube-apiserver-operator logs, kube-apiserver-operator acquires the lock immediately when being gracefully terminated as expected. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2409 |