1888026 – workaround kubelet graceful termination of static pods bug

Bug 1888026 - workaround kubelet graceful termination of static pods bug

Summary: workaround kubelet graceful termination of static pods bug

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.6.z
Assignee:	David Eads
QA Contact:	Ke Wang
Docs Contact:
URL:
Whiteboard:
Depends On:	1888015
Blocks:
TreeView+	depends on / blocked

Reported:	2020-10-13 20:12 UTC by David Eads
Modified:	2020-11-09 15:51 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:	1888015
Environment:
Last Closed:	2020-11-09 15:50:58 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 479	None	closed	bug 1888026: bump library-go for static pod uid	2021-01-11 03:03:26 UTC
Github	openshift cluster-kube-apiserver-operator pull 994	None	closed	bug 1888026: [release-4.6] bump library-go for static pod uid	2021-01-11 03:03:24 UTC
Github	openshift cluster-kube-controller-manager-operator pull 474	None	closed	bug 1888026: bump library-go for static pod uid	2021-01-11 03:03:27 UTC
Github	openshift cluster-kube-scheduler-operator pull 296	None	closed	bug 1888026: bump library-go for static pod uid	2021-01-11 03:03:28 UTC
Github	openshift library-go pull 924	None	closed	bug 1888026: [release-4.6] add UID to all static pods to trick kubelet into honoring grace period	2021-01-11 03:03:28 UTC
Github	openshift library-go pull 934	None	closed	bug 1888026: [release-4.6] collapse static pod bytes	2021-01-11 03:03:26 UTC
Red Hat Product Errata	RHBA-2020:4339	None	None	None	2020-11-09 15:51:22 UTC

Description David Eads 2020-10-13 20:12:52 UTC

+++ This bug was initially created as a clone of Bug #1888015 +++

graceful termination of static pods with the same pod name, filename, and no uuid does not work.

We work around this by assigning a fake uid.

This prevents kube-apiserver downtime during configuration rollout.

Comment 2 Ke Wang 2020-10-29 08:46:27 UTC

First checked "[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully" test should fail less often after the fix, this will be tested via large scale of CI runs and a long time to check whether the failure frequency decreases.

Second, check the PR code, the pod YAML file of the master now will be writen with uid.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-10-28-101609   True        False         5h43m   Cluster version is 4.6.0-0.nightly-2020-10-28-101609

$ oc debug node/<master>

sh-4.4# cd /etc/kubernetes/static-pod-resources

sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
kube-apiserver-pod-2/kube-apiserver-pod.yaml:"uid":"e38f65e0-ec0b-4921-a9ba-40a741425b84"
...
kube-apiserver-pod-7/kube-apiserver-pod.yaml:"uid":"3e7e090c-fe1f-428b-920f-35e3771aa163"

For etcd,
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml
etcd-pod-2/etcd-pod.yaml:"uid":"9d1d9155-3a6d-4fae-9f22-02bdf85d1710"
etcd-pod-3/etcd-pod.yaml:"uid":"62fd0454-b1a9-42ff-8074-ba1073c6b183"

For kube-controller-manager,
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
kube-controller-manager-pod-5/kube-controller-manager-pod.yaml:"uid":"87acfe64-a8dd-41f5-97fd-c21e6be0ed5d"
...
kube-controller-manager-pod-8/kube-controller-manager-pod.yaml:"uid":"33cb1203-0c4c-495f-a1eb-71c34f68381c"

For kube-schedule,
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
kube-scheduler-pod-4/kube-scheduler-pod.yaml:"uid":"c5a52e49-102e-48af-a8c0-a6ec01f0696c"
...
kube-scheduler-pod-8/kube-scheduler-pod.yaml:"uid":"b435cdce-6454-4ad5-845c-58cb7df8c801"

Checked 4.6.0-0.nightly-2020-10-26-151252 build env without the fix, ssh to master:
sh-4.4# cd /etc/kubernetes/static-pod-resources/
sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml

None is returned, means no uid

This is expected by the PR, so moving to VERIFIED

Comment 5 errata-xmlrpc 2020-11-09 15:50:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4339

Note You need to log in before you can comment on or make changes to this bug.