Bug 1877793

Summary: KS doesn't gracefully terminate when rolling out
Product: OpenShift Container Platform Reporter: Tomáš Nožička <tnozicka>
Component: kube-schedulerAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: aos-bugs, maszulik, mfojtik, yinzhou
Target Milestone: ---   
Target Release: 4.6.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-10-27 16:39:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1881351    

Description Tomáš Nožička 2020-09-10 13:00:00 UTC
KS need to gracefully terminate so the next replica can take over during a rollout. Graceful termination is important for giving up the lease, so another replica can become the leader without waiting 60s for the lease to expire.

Comment 2 zhou ying 2020-09-22 05:07:43 UTC
When update scheduler operator cluster or delete the static pod yaml file from master node  ,we could see logs like :
I0922 02:45:38.708205       1 server.go:207] Requested to terminate. Exiting.

and the other pods acquired the lead less than 20S.

Comment 3 Tomáš Nožička 2020-09-22 08:24:27 UTC
lease duration 15s, renew deadline 10s - so it should be shortly after 10s

Comment 5 zhou ying 2020-09-25 13:46:08 UTC
Confirmed with 4.6.0-0.nightly-2020-09-25-085318, the issue has fixed:

the KS will renew lead within 10s . 
I0925 13:41:45.177618       1 server.go:207] Requested to terminate. Exiting.
I0925 13:41:45.747402       1 leaderelection.go:253] successfully acquired lease openshift-kube-scheduler/kube-scheduler. 



I0925 13:45:11.188109       1 server.go:207] Requested to terminate. Exiting.
I0925 13:45:11.243587       1 leaderelection.go:253] successfully acquired lease openshift-kube-scheduler/kube-scheduler

Comment 8 errata-xmlrpc 2020-10-27 16:39:32 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196