Bug 1888026

Summary: workaround kubelet graceful termination of static pods bug
Product: OpenShift Container Platform Reporter: David Eads <deads>
Component: kube-apiserverAssignee: David Eads <deads>
Status: CLOSED ERRATA QA Contact: Ke Wang <kewang>
Severity: high Docs Contact:
Priority: high    
Version: 4.6CC: aos-bugs, kewang, mfojtik, wlewis, xxia
Target Milestone: ---   
Target Release: 4.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1888015 Environment:
Last Closed: 2020-11-09 15:50:58 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On: 1888015    
Bug Blocks:    

Description David Eads 2020-10-13 20:12:52 UTC
+++ This bug was initially created as a clone of Bug #1888015 +++

graceful termination of static pods with the same pod name, filename, and no uuid does not work.

We work around this by assigning a fake uid.

This prevents kube-apiserver downtime during configuration rollout.

Comment 2 Ke Wang 2020-10-29 08:46:27 UTC
First checked "[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully" test should fail less often after the fix, this will be tested via large scale of CI runs and a long time to check whether the failure frequency decreases.

Second, check the PR code, the pod YAML file of the master now will be writen with uid.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-10-28-101609   True        False         5h43m   Cluster version is 4.6.0-0.nightly-2020-10-28-101609

$ oc debug node/<master>

sh-4.4# cd /etc/kubernetes/static-pod-resources

sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
kube-apiserver-pod-2/kube-apiserver-pod.yaml:"uid":"e38f65e0-ec0b-4921-a9ba-40a741425b84"
...
kube-apiserver-pod-7/kube-apiserver-pod.yaml:"uid":"3e7e090c-fe1f-428b-920f-35e3771aa163"

For etcd,
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml
etcd-pod-2/etcd-pod.yaml:"uid":"9d1d9155-3a6d-4fae-9f22-02bdf85d1710"
etcd-pod-3/etcd-pod.yaml:"uid":"62fd0454-b1a9-42ff-8074-ba1073c6b183"

For kube-controller-manager,
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
kube-controller-manager-pod-5/kube-controller-manager-pod.yaml:"uid":"87acfe64-a8dd-41f5-97fd-c21e6be0ed5d"
...
kube-controller-manager-pod-8/kube-controller-manager-pod.yaml:"uid":"33cb1203-0c4c-495f-a1eb-71c34f68381c"

For kube-schedule,
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
kube-scheduler-pod-4/kube-scheduler-pod.yaml:"uid":"c5a52e49-102e-48af-a8c0-a6ec01f0696c"
...
kube-scheduler-pod-8/kube-scheduler-pod.yaml:"uid":"b435cdce-6454-4ad5-845c-58cb7df8c801"

Checked 4.6.0-0.nightly-2020-10-26-151252 build env without the fix, ssh to master:
sh-4.4# cd /etc/kubernetes/static-pod-resources/
sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml

None is returned, means no uid

This is expected by the PR, so moving to VERIFIED

Comment 5 errata-xmlrpc 2020-11-09 15:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4339