1877793 – KS doesn't gracefully terminate when rolling out

Bug 1877793 - KS doesn't gracefully terminate when rolling out

Summary: KS doesn't gracefully terminate when rolling out

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-scheduler
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.0
Assignee:	Tomáš Nožička
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1881351
TreeView+	depends on / blocked

Reported:	2020-09-10 13:00 UTC by Tomáš Nožička
Modified:	2020-10-27 16:39 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-10-27 16:39:32 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift kubernetes pull 350	None	closed	Bug 1877791: Release lock on KCM and KS termination	2020-10-23 06:26:19 UTC
Github	openshift kubernetes pull 367	None	closed	Bug 1877793: Force releasing the lock on exit for KS	2020-10-23 06:26:19 UTC
Red Hat Product Errata	RHBA-2020:4196	None	None	None	2020-10-27 16:39:48 UTC

Description Tomáš Nožička 2020-09-10 13:00:00 UTC

KS need to gracefully terminate so the next replica can take over during a rollout. Graceful termination is important for giving up the lease, so another replica can become the leader without waiting 60s for the lease to expire.

Comment 2 zhou ying 2020-09-22 05:07:43 UTC

When update scheduler operator cluster or delete the static pod yaml file from master node  ,we could see logs like :
I0922 02:45:38.708205       1 server.go:207] Requested to terminate. Exiting.

and the other pods acquired the lead less than 20S.

Comment 3 Tomáš Nožička 2020-09-22 08:24:27 UTC

lease duration 15s, renew deadline 10s - so it should be shortly after 10s

Comment 5 zhou ying 2020-09-25 13:46:08 UTC

Confirmed with 4.6.0-0.nightly-2020-09-25-085318, the issue has fixed:

the KS will renew lead within 10s . 
I0925 13:41:45.177618       1 server.go:207] Requested to terminate. Exiting.
I0925 13:41:45.747402       1 leaderelection.go:253] successfully acquired lease openshift-kube-scheduler/kube-scheduler. 



I0925 13:45:11.188109       1 server.go:207] Requested to terminate. Exiting.
I0925 13:45:11.243587       1 leaderelection.go:253] successfully acquired lease openshift-kube-scheduler/kube-scheduler

Comment 8 errata-xmlrpc 2020-10-27 16:39:32 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196

Note You need to log in before you can comment on or make changes to this bug.