Bug 1888026 - workaround kubelet graceful termination of static pods bug
Summary: workaround kubelet graceful termination of static pods bug
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.6
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.6.z
Assignee: David Eads
QA Contact: Ke Wang
URL:
Whiteboard:
Depends On: 1888015
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-10-13 20:12 UTC by David Eads
Modified: 2020-11-09 15:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1888015
Environment:
Last Closed: 2020-11-09 15:50:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift cluster-etcd-operator pull 479 0 None closed bug 1888026: bump library-go for static pod uid 2021-01-11 03:03:26 UTC
Github openshift cluster-kube-apiserver-operator pull 994 0 None closed bug 1888026: [release-4.6] bump library-go for static pod uid 2021-01-11 03:03:24 UTC
Github openshift cluster-kube-controller-manager-operator pull 474 0 None closed bug 1888026: bump library-go for static pod uid 2021-01-11 03:03:27 UTC
Github openshift cluster-kube-scheduler-operator pull 296 0 None closed bug 1888026: bump library-go for static pod uid 2021-01-11 03:03:28 UTC
Github openshift library-go pull 924 0 None closed bug 1888026: [release-4.6] add UID to all static pods to trick kubelet into honoring grace period 2021-01-11 03:03:28 UTC
Github openshift library-go pull 934 0 None closed bug 1888026: [release-4.6] collapse static pod bytes 2021-01-11 03:03:26 UTC
Red Hat Product Errata RHBA-2020:4339 0 None None None 2020-11-09 15:51:22 UTC

Description David Eads 2020-10-13 20:12:52 UTC
+++ This bug was initially created as a clone of Bug #1888015 +++

graceful termination of static pods with the same pod name, filename, and no uuid does not work.

We work around this by assigning a fake uid.

This prevents kube-apiserver downtime during configuration rollout.

Comment 2 Ke Wang 2020-10-29 08:46:27 UTC
First checked "[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully" test should fail less often after the fix, this will be tested via large scale of CI runs and a long time to check whether the failure frequency decreases.

Second, check the PR code, the pod YAML file of the master now will be writen with uid.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-10-28-101609   True        False         5h43m   Cluster version is 4.6.0-0.nightly-2020-10-28-101609

$ oc debug node/<master>

sh-4.4# cd /etc/kubernetes/static-pod-resources

sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
kube-apiserver-pod-2/kube-apiserver-pod.yaml:"uid":"e38f65e0-ec0b-4921-a9ba-40a741425b84"
...
kube-apiserver-pod-7/kube-apiserver-pod.yaml:"uid":"3e7e090c-fe1f-428b-920f-35e3771aa163"

For etcd,
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml
etcd-pod-2/etcd-pod.yaml:"uid":"9d1d9155-3a6d-4fae-9f22-02bdf85d1710"
etcd-pod-3/etcd-pod.yaml:"uid":"62fd0454-b1a9-42ff-8074-ba1073c6b183"

For kube-controller-manager,
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
kube-controller-manager-pod-5/kube-controller-manager-pod.yaml:"uid":"87acfe64-a8dd-41f5-97fd-c21e6be0ed5d"
...
kube-controller-manager-pod-8/kube-controller-manager-pod.yaml:"uid":"33cb1203-0c4c-495f-a1eb-71c34f68381c"

For kube-schedule,
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
kube-scheduler-pod-4/kube-scheduler-pod.yaml:"uid":"c5a52e49-102e-48af-a8c0-a6ec01f0696c"
...
kube-scheduler-pod-8/kube-scheduler-pod.yaml:"uid":"b435cdce-6454-4ad5-845c-58cb7df8c801"

Checked 4.6.0-0.nightly-2020-10-26-151252 build env without the fix, ssh to master:
sh-4.4# cd /etc/kubernetes/static-pod-resources/
sh-4.4# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml   
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml

None is returned, means no uid

This is expected by the PR, so moving to VERIFIED

Comment 5 errata-xmlrpc 2020-11-09 15:50:58 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6.3 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4339


Note You need to log in before you can comment on or make changes to this bug.