Bug 1888026

Summary:	workaround kubelet graceful termination of static pods bug
Product:	OpenShift Container Platform	Reporter:	David Eads <deads>
Component:	kube-apiserver	Assignee:	David Eads <deads>
Status:	CLOSED ERRATA	QA Contact:	Ke Wang <kewang>
Severity:	high	Docs Contact:
Priority:	high
Version:	4.6	CC:	aos-bugs, kewang, mfojtik, wlewis, xxia
Target Milestone:	---
Target Release:	4.6.z
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:	1888015	Environment:
Last Closed:	2020-11-09 15:50:58 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:	1888015
Bug Blocks:

Description David Eads 2020-10-13 20:12:52 UTC

+++ This bug was initially created as a clone of Bug #1888015 +++

graceful termination of static pods with the same pod name, filename, and no uuid does not work.

We work around this by assigning a fake uid.

This prevents kube-apiserver downtime during configuration rollout.

Comment 2 Ke Wang 2020-10-29 08:46:27 UTC

First checked "[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully" test should fail less often after the fix, this will be tested via large scale of CI runs and a long time to check whether the failure frequency decreases.

Second, check the PR code, the pod YAML file of the master now will be writen with uid.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-10-28-101609   True        False         5h43m   Cluster version is 4.6.0-0.nightly-2020-10-28-101609

$ oc debug node/<master>

sh-4.4# cd /etc/kubernetes/static-pod-resources

sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
kube-apiserver-pod-2/kube-apiserver-pod.yaml:"uid":"e38f65e0-ec0b-4921-a9ba-40a741425b84"
...
kube-apiserver-pod-7/kube-apiserver-pod.yaml:"uid":"3e7e090c-fe1f-428b-920f-35e3771aa163"

For etcd,
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml
etcd-pod-2/etcd-pod.yaml:"uid":"9d1d9155-3a6d-4fae-9f22-02bdf85d1710"
etcd-pod-3/etcd-pod.yaml:"uid":"62fd0454-b1a9-42ff-8074-ba1073c6b183"

For kube-controller-manager,
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
kube-controller-manager-pod-5/kube-controller-manager-pod.yaml:"uid":"87acfe64-a8dd-41f5-97fd-c21e6be0ed5d"
...
kube-controller-manager-pod-8/kube-controller-manager-pod.yaml:"uid":"33cb1203-0c4c-495f-a1eb-71c34f68381c"

For kube-schedule,
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
kube-scheduler-pod-4/kube-scheduler-pod.yaml:"uid":"c5a52e49-102e-48af-a8c0-a6ec01f0696c"
...
kube-scheduler-pod-8/kube-scheduler-pod.yaml:"uid":"b435cdce-6454-4ad5-845c-58cb7df8c801"

Checked 4.6.0-0.nightly-2020-10-26-151252 build env without the fix, ssh to master:
sh-4.4# cd /etc/kubernetes/static-pod-resources/
sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml

None is returned, means no uid

This is expected by the PR, so moving to VERIFIED

Comment 5 errata-xmlrpc 2020-11-09 15:50:58 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4339