1888015 – workaround kubelet graceful termination of static pods bug

Bug 1888015 - workaround kubelet graceful termination of static pods bug

Summary: workaround kubelet graceful termination of static pods bug

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-apiserver
Sub Component:
Version:	4.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	4.7.0
Assignee:	David Eads
QA Contact:	Xingxing Xia
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1888026 1888052
TreeView+	depends on / blocked

Reported:	2020-10-13 19:55 UTC by David Eads
Modified:	2021-02-24 15:26 UTC (History)
CC List:	5 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	1888026 (view as bug list)
Environment:
Last Closed:	2021-02-24 15:25:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	openshift cluster-etcd-operator pull 464	None	closed	bug 1888015: bump library-go to pick up static pod graceful timeout workaround	2021-02-17 13:06:01 UTC
Github	openshift cluster-kube-apiserver-operator pull 981	None	closed	bug 1888015: bump library-go to pick up static pod graceful timeout workaround	2021-02-17 13:06:01 UTC
Github	openshift cluster-kube-controller-manager-operator pull 467	None	closed	bug 1888015: bump library-go to pick up static pod graceful timeout workaround	2021-02-17 13:06:01 UTC
Github	openshift cluster-kube-scheduler-operator pull 291	None	closed	bug 1888015: bump library-go to pick up static pod graceful timeout workaround	2021-02-17 13:06:02 UTC
Red Hat Product Errata	RHSA-2020:5633	None	None	None	2021-02-24 15:26:09 UTC

Description David Eads 2020-10-13 19:55:25 UTC

graceful termination of static pods with the same pod name, filename, and no uuid does not work.

We work around this by assigning a fake uid.

Comment 2 Ke Wang 2020-10-20 11:07:53 UTC

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2020-10-17-034503   True        False         8h      Cluster version is 4.7.0-0.nightly-2020-10-17-034503

Connect to one master node,
$ oc debug node/ip-xx-x-195-122.us-east-2.compute.internal

-For kube-apiserver, change one container requested memory size in kube-apiserver-pod.yaml, before change, check the current process ID of kube-apiserver.

sh-4.4# ps -ef |grep ' kube-apiserver ' 
root      415810  415772 99 08:45 ?        00:00:32 kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=10.0.195.122 -v=8

sh-4.4# cd /etc/kubernetes/manifests

sh-4.4# vi kube-apiserver-pod.yaml # changed "memory": "50Mi" to "55Mi"

New kube-apiserver was started up with new process ID.
sh-4.4# ps -ef |grep ' kube-apiserver '
root      429872  429836 99 08:57 ?        00:00:01 kube-apiserver --openshift-config=/etc/kubernetes/static-pod-resources/configmaps/config/config.yaml --advertise-address=10.0.195.122 -v=8

In another terminal console, check if the kube-apiserver has been restarted.
$ oc get pods -n openshift-kube-apiserver --show-labels -l apiserver
NAME                                                        READY   STATUS    RESTARTS   AGE   LABELS
...
kube-apiserver-ip-xx-x-195-122.us-east-2.compute.internal   5/5     Running   13         19s   apiserver=true,app=openshift-kube-apiserver,revision=7

--------------------
- For etcd, 
Before change, check the current process ID of etcd,
sh-4.4# ps -ef |grep ' etcd ' | grep -v grep | awk '{print $2}'
448696
448769

sh-4.4# vi etcd-pod.yaml # changed "memory": "50Mi" to "55Mi"

New etcd server was started up with new process ID.
sh-4.4# sh-4.4# ps -ef |grep ' etcd ' | grep -v grep | awk '{print $2}'
452352
452414

In another terminal console, check if the etcd server has been restarted.
$ oc get pods -n openshift-etcd -l app=etcd --show-labels
...
etcd-ip-xx-x-195-122.us-east-2.compute.internal   3/3     Running   0          78s     app=etcd,etcd=true,k8s-app=etcd,revision=3

---------------------
- For kube-controller-manager,
Before change, check the current process ID of kube-controller-manager,
sh-4.4# ps -ef |grep ' kube-controller-manager ' | grep -v grep | awk '{print $2}' 
2240

sh-4.4# vi kube-controller-manager-pod.yaml # changed "memory": "50Mi" to "55Mi"

New kube-controller-manager server was started up with new process ID.
sh-4.4# ps -ef |grep ' kube-controller-manager ' | grep -v grep | awk '{print $2}' 
464346

In another terminal console, check if the kube-controller-manager server has been restarted.
$ oc get pods -n openshift-kube-controller-manager --show-labels  | grep kube-controller-manager
...
kube-controller-manager-ip-xx-x-195-122.us-east-2.compute.internal   2/4     Running     0          22s     app=kube-controller-manager,kube-controller-manager=true,revision=7

--------------------
- For kube-scheduler,
Before change, check the current process ID of kube-scheduler,
sh-4.4# ps -ef |grep ' kube-scheduler ' | grep -v grep | awk '{print $2}' 
4083

sh-4.4# vi kube-scheduler-pod.yaml # changed "memory": "50Mi" to "55Mi"

New kube-scheduler server was started up with new process ID. 
sh-4.4# ps -ef |grep ' kube-scheduler ' | grep -v grep | awk '{print $2}' 
531438

In another terminal console, check if the kube-scheduler server has been restarted.
$ oc get pods -n openshift-kube-scheduler --show-labels  | grep kube-scheduler
...
openshift-kube-scheduler-ip-xx-x-195-122.us-east-2.compute.internal   1/2     Running     0          32s   app=openshift-kube-scheduler,revision=6,scheduler=true

Comment 3 Ke Wang 2020-10-20 11:11:03 UTC

Hi deads, please see my verification, one question for kube-apiserver termination and restarting, many times of RESTARTS, is this as expected?

> $ oc get pods -n openshift-kube-apiserver --show-labels -l apiserver
NAME                                                        READY   STATUS    RESTARTS   AGE   LABELS
...
kube-apiserver-ip-xx-x-195-122.us-east-2.compute.internal   5/5     Running   13         19s   apiserver=true,app=openshift-kube-apiserver,revision=7

Comment 5 Xingxing Xia 2020-10-23 15:53:57 UTC

Ke Wang, looks like you checks process ID. This bug checks pod's uid instead of process id, under pod YAML's metadata.
First checked "[sig-api-machinery][Feature:APIServer][Late] kubelet terminates kube-apiserver gracefully" test, it is under the "origin" repo's test/extended/apiserver/graceful_termination.go, it is tested by only checking a cluster's events' json:
                for _, ev := range evs.Items {
                        if ev.Reason != "NonGracefulTermination" {
                                continue
                        }

                        t.Errorf("kube-apiserver reports a non-graceful termination: %#v. Probably kubelet or CRI-O is not giving the time to cleanly shut down. This can lead to connection refused and network I/O timeout errors in other components.", ev)
                }

It it finds a NonGracefulTermination event, then the case fails. This case does not make test data, it only checks cluster data, so it can only be tested via large scale of CI runs to check whether the failure frequency is much less.

Second, check the PR code, the pod YAML file it writes to the master now have uid set:
	pod.UID = uuid.NewUUID()
	finalPodBytes := resourceread.WritePodV1OrDie(pod)

        if err := ioutil.WriteFile(path.Join(resourceDir, podFileName), []byte(finalPodBytes), 0644); err != nil {

This can be proved by:
Checked 4.6.1 env which does not include the fix, ssh to master:
[root@ip-10-0-138-235 static-pod-resources]# cd /etc/kubernetes/static-pod-resources/
[root@ip-10-0-138-235 static-pod-resources]# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
None is returned, means no uid

But checked 4.7.0-0.nightly-2020-10-23-004149 env, the written files have uid:
[root@ip-10-0-157-133 static-pod-resources]# grep -o -P '"uid":".*?"' kube-apiserver-pod-*/kube-apiserver-pod.yaml
kube-apiserver-pod-2/kube-apiserver-pod.yaml:"uid":"69adaa39-692a-4240-b65a-1a86fc35e6d9"
...
kube-apiserver-pod-8/kube-apiserver-pod.yaml:"uid":"9e2b87a6-0e37-4073-9984-afd1d4e8f803"

This is expected by the PR, so moving to VERIFIED

Comment 6 Ke Wang 2020-10-26 03:05:40 UTC

With above Xingxing's verification for kube-apiserver, checked for other three PRs.

For etcd,
sh-4.4# cd /etc/kubernetes/static-pod-resources
sh-4.4# grep -o -P '"uid":".*?"' etcd-pod-*/etcd-pod.yaml
etcd-pod-3/etcd-pod.yaml:"uid":"54c077a5-2898-4117-80d6-576ad1220ed8"
etcd-pod-4/etcd-pod.yaml:"uid":"56b249f6-16a1-496f-8c47-63fd5182b9e5"

For kube-controller-manager,
sh-4.4# grep -o -P '"uid":".*?"' kube-controller-manager-pod*/kube-controller-manager-pod.yaml
kube-controller-manager-pod-3/kube-controller-manager-pod.yaml:"uid":"1318e7e5-b22b-43c2-859e-60d4d2463b51"
kube-controller-manager-pod-5/kube-controller-manager-pod.yaml:"uid":"9e253dad-6e22-4169-8bd2-b22e6509fa95"
kube-controller-manager-pod-6/kube-controller-manager-pod.yaml:"uid":"3adfbe1c-8256-4744-b249-599d071d3308"
kube-controller-manager-pod-7/kube-controller-manager-pod.yaml:"uid":"3ce9025d-7a94-413e-86a4-eb2b900c6dae"

For kube-schedule,
sh-4.4# grep -o -P '"uid":".*?"' kube-scheduler-pod-*/kube-scheduler-pod.yaml                 
kube-scheduler-pod-5/kube-scheduler-pod.yaml:"uid":"4a745a27-3a0f-4a57-aa7c-896a24a92779"
kube-scheduler-pod-6/kube-scheduler-pod.yaml:"uid":"991535c4-6f74-4bd3-afa1-8f964ee7762c"
kube-scheduler-pod-7/kube-scheduler-pod.yaml:"uid":"a6e4a686-006a-4588-b1d5-25587f3ed6a8"

All have uid setting.

Comment 9 errata-xmlrpc 2021-02-24 15:25:41 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.