Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1700279

Summary:	Cri-o and kubelet hide gracefully terminating static pods
Product:	OpenShift Container Platform	Reporter:	Stefan Schimanski <sttts>
Component:	Node	Assignee:	Seth Jennings <sjenning>
Status:	CLOSED ERRATA	QA Contact:	Sunil Choudhary <schoudha>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	4.1.0	CC:	adahiya, aos-bugs, erich, jokerman, mmccomas, mpatel, nagrawal, rphillips, sbatsche, schoudha, sjenning
Target Milestone:	---	Flags:	zyu: needinfo-
Target Release:	4.6.0
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	No Doc Update
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2020-10-27 15:54:19 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Stefan Schimanski 2019-04-16 08:47:28 UTC

Description of problem:

Static pods with graceful shutdown period disappear immediately from both the kubelet (it removes the mirror pods from the apiserver) and from cri-o (crictl does not see the containers anymore).

The actual operating system process of the containers keep running and do their job. But there is no access to logs and the kubelet immediately starts new pods in parallel if a new static manifest is copied to /etc/kubernetes/manifests.

We heavily depend on graceful shutdown for the control plane. It looks like it works as required by luck, but can break any time. Instead kubelet+cri-o should properly, consistently support graceful shutdown, with visibility of the pod and containers until they really terminate, and without losing logs. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. ssh into master
2. remove static pod of kube-apiserver from /etc/kubernetes/manifests
3. monitor the mirror pod in the API and how it goes away immediately
4. monitor `crictl ps` and how the container disappears immediately.
5. monitor `ps aux` and how the process keeps running for 35+ seconds.

Expected results:

* the mirror pod stays until the actual termination of the process.
* crictl sees the containers until the actual termination of the process.
* `crictl logs` and `kubectl logs` keep showing the logs until the actual termination.

Additional info:

The master team considers this as a blocker for 4.2.

Comment 1 Seth Jennings 2019-07-29 20:40:49 UTC

I am able to see that mirror pod on the API server goes away immediately.

However, crictl ps continues to show the pod during the termination grace period.  It is possible that the move to cri-o 1.14 changed/fixed this behavior.

The terminationGracePeriodSeconds on the kube-apiserver static pods is 135s.  I observe the pod exits gracefully on its own after about 65-70s.

Stefan, can you confirm this change in behavior since the time you reported this?

If so, there is still the issue of the kubelet immediately deleting the mirror pod from the apiserver until waiting for PLEG to indicate that the pod is actually dead (the situation for normal pods).

Comment 2 Seth Jennings 2019-07-29 21:22:28 UTC

Talked to Stefan and he decided this isn't a 4.2 blocker, but I would like to get it figured out.

Robert you were in this code recently with https://github.com/kubernetes/kubernetes/pull/79148.  Could you take a look?

The issue is that the mirror pod is deleted immediately from the apiserver while the static pod is gracefully terminated on the node.  The desire is for the mirror pod to be deleted only when the static pod has completely terminated.  This way the logs are accessible via the apiserver during termination.

Currently when I remove the static pod yaml from the manifests, the mirror pod is immediately deleted

Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.122924    1526 kubelet.go:1904] SyncLoop (REMOVE, "file"): "kube-apiserver-master-2_openshift-kube-apiserver(60d66dd8b81c2a4756d2e2f69ec1e3fc)"
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.401404    1526 mirror_client.go:93] Deleting a mirror pod "kube-apiserver-master-2_openshift-kube-apiserver" (uid (*types.UID)(nil))
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.435138    1526 kubelet_pods.go:1695] Orphaned pod "60d66dd8b81c2a4756d2e2f69ec1e3fc" found, but volumes not yet removed.  Reducing cpu to minimum
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.435544    1526 kubelet.go:1910] SyncLoop (DELETE, "api"): "kube-apiserver-master-2_openshift-kube-apiserver(08b81ce8-b243-11e9-a38f-fa163e8871d1)"
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.441528    1526 kubelet.go:1904] SyncLoop (REMOVE, "api"): "kube-apiserver-master-2_openshift-kube-apiserver(08b81ce8-b243-11e9-a38f-fa163e8871d1)"

On this path

HandlePodCleanups()
DeleteOrphanedMirrorPods()
DeleteMirrorPod()

Comment 3 Seth Jennings 2019-07-31 15:56:11 UTC

We are going to have to work with upstream on this and has the potential to be disruptive.  Deferring to 4.3.

Comment 14 Harshal Patil 2020-05-22 15:28:23 UTC

*** Bug 1828606 has been marked as a duplicate of this bug. ***

Comment 22 Ted Yu 2020-05-30 15:11:22 UTC

New test fails without the proposed fix:
https://github.com/kubernetes/kubernetes/pull/91453#issuecomment-636267692

Comment 23 Ted Yu 2020-06-11 21:04:01 UTC

Updated the description of #91453 with two parts:

structure of code (why certain methods and struct are located the way they're in the PR)
test strategy - e2e and unit tests

Comment 27 Seth Jennings 2020-08-10 16:05:56 UTC

Ryan is on leave

Comment 28 Seth Jennings 2020-08-10 19:21:39 UTC

https://github.com/kubernetes/kubernetes/pull/92442 merged and is included in the 1.19-rc2 rebase for 4.6

Comment 33 Seth Jennings 2020-08-18 14:25:30 UTC

> Or the mirror pod will still be visible in API till grace period?

Yes, this.  Before, the mirror pod would be removed from the apiserver immediately, even though it was still terminating on the node.  New behavior is the mirror pod is not deleted until the pod actually stops on the node (after clean termination or grace period).

Comment 34 Sunil Choudhary 2020-09-09 17:37:59 UTC

Verified on 4.6.0-0.nightly-2020-09-09-003430.

Created a static pod on node with terminationGracePeriodSeconds 120. Mirror pod was created and appeared through API. Removed static pod yaml from node, mirror pod was still visible through API until grace period.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-09-003430   True        False         11h     Cluster version is 4.6.0-0.nightly-2020-09-09-003430

# cat /etc/kubernetes/manifests/sleep-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: gcr.io/google_containers/busybox:1.24
    imagePullPolicy: IfNotPresent
    command: ['sh', '-c', 'sleep 3600' ]
  terminationGracePeriodSeconds: 120

Comment 36 errata-xmlrpc 2020-10-27 15:54:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196