Bug 1700279 - Cri-o and kubelet hide gracefully terminating static pods
Summary: Cri-o and kubelet hide gracefully terminating static pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.6.0
Assignee: Seth Jennings
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 1828606 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-16 08:47 UTC by Stefan Schimanski
Modified: 2020-10-27 15:54 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-27 15:54:19 UTC
Target Upstream Version:
Embargoed:
zyu: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:4196 0 None None None 2020-10-27 15:54:36 UTC

Description Stefan Schimanski 2019-04-16 08:47:28 UTC
Description of problem:

Static pods with graceful shutdown period disappear immediately from both the kubelet (it removes the mirror pods from the apiserver) and from cri-o (crictl does not see the containers anymore).

The actual operating system process of the containers keep running and do their job. But there is no access to logs and the kubelet immediately starts new pods in parallel if a new static manifest is copied to /etc/kubernetes/manifests.

We heavily depend on graceful shutdown for the control plane. It looks like it works as required by luck, but can break any time. Instead kubelet+cri-o should properly, consistently support graceful shutdown, with visibility of the pod and containers until they really terminate, and without losing logs. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. ssh into master
2. remove static pod of kube-apiserver from /etc/kubernetes/manifests
3. monitor the mirror pod in the API and how it goes away immediately
4. monitor `crictl ps` and how the container disappears immediately.
5. monitor `ps aux` and how the process keeps running for 35+ seconds.

Expected results:

* the mirror pod stays until the actual termination of the process.
* crictl sees the containers until the actual termination of the process.
* `crictl logs` and `kubectl logs` keep showing the logs until the actual termination.

Additional info:

The master team considers this as a blocker for 4.2.

Comment 1 Seth Jennings 2019-07-29 20:40:49 UTC
I am able to see that mirror pod on the API server goes away immediately.

However, crictl ps continues to show the pod during the termination grace period.  It is possible that the move to cri-o 1.14 changed/fixed this behavior.

The terminationGracePeriodSeconds on the kube-apiserver static pods is 135s.  I observe the pod exits gracefully on its own after about 65-70s.

Stefan, can you confirm this change in behavior since the time you reported this?

If so, there is still the issue of the kubelet immediately deleting the mirror pod from the apiserver until waiting for PLEG to indicate that the pod is actually dead (the situation for normal pods).

Comment 2 Seth Jennings 2019-07-29 21:22:28 UTC
Talked to Stefan and he decided this isn't a 4.2 blocker, but I would like to get it figured out.

Robert you were in this code recently with https://github.com/kubernetes/kubernetes/pull/79148.  Could you take a look?

The issue is that the mirror pod is deleted immediately from the apiserver while the static pod is gracefully terminated on the node.  The desire is for the mirror pod to be deleted only when the static pod has completely terminated.  This way the logs are accessible via the apiserver during termination.

Currently when I remove the static pod yaml from the manifests, the mirror pod is immediately deleted

Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.122924    1526 kubelet.go:1904] SyncLoop (REMOVE, "file"): "kube-apiserver-master-2_openshift-kube-apiserver(60d66dd8b81c2a4756d2e2f69ec1e3fc)"
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.401404    1526 mirror_client.go:93] Deleting a mirror pod "kube-apiserver-master-2_openshift-kube-apiserver" (uid (*types.UID)(nil))
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.435138    1526 kubelet_pods.go:1695] Orphaned pod "60d66dd8b81c2a4756d2e2f69ec1e3fc" found, but volumes not yet removed.  Reducing cpu to minimum
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.435544    1526 kubelet.go:1910] SyncLoop (DELETE, "api"): "kube-apiserver-master-2_openshift-kube-apiserver(08b81ce8-b243-11e9-a38f-fa163e8871d1)"
Jul 29 20:54:44 master-2 hyperkube[1526]: I0729 20:54:44.441528    1526 kubelet.go:1904] SyncLoop (REMOVE, "api"): "kube-apiserver-master-2_openshift-kube-apiserver(08b81ce8-b243-11e9-a38f-fa163e8871d1)"

On this path

HandlePodCleanups()
DeleteOrphanedMirrorPods()
DeleteMirrorPod()

Comment 3 Seth Jennings 2019-07-31 15:56:11 UTC
We are going to have to work with upstream on this and has the potential to be disruptive.  Deferring to 4.3.

Comment 14 Harshal Patil 2020-05-22 15:28:23 UTC
*** Bug 1828606 has been marked as a duplicate of this bug. ***

Comment 22 Ted Yu 2020-05-30 15:11:22 UTC
New test fails without the proposed fix:
https://github.com/kubernetes/kubernetes/pull/91453#issuecomment-636267692

Comment 23 Ted Yu 2020-06-11 21:04:01 UTC
Updated the description of #91453 with two parts:

structure of code (why certain methods and struct are located the way they're in the PR)
test strategy - e2e and unit tests

Comment 27 Seth Jennings 2020-08-10 16:05:56 UTC
Ryan is on leave

Comment 28 Seth Jennings 2020-08-10 19:21:39 UTC
https://github.com/kubernetes/kubernetes/pull/92442 merged and is included in the 1.19-rc2 rebase for 4.6

Comment 33 Seth Jennings 2020-08-18 14:25:30 UTC
> Or the mirror pod will still be visible in API till grace period?

Yes, this.  Before, the mirror pod would be removed from the apiserver immediately, even though it was still terminating on the node.  New behavior is the mirror pod is not deleted until the pod actually stops on the node (after clean termination or grace period).

Comment 34 Sunil Choudhary 2020-09-09 17:37:59 UTC
Verified on 4.6.0-0.nightly-2020-09-09-003430.

Created a static pod on node with terminationGracePeriodSeconds 120. Mirror pod was created and appeared through API. Removed static pod yaml from node, mirror pod was still visible through API until grace period.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.6.0-0.nightly-2020-09-09-003430   True        False         11h     Cluster version is 4.6.0-0.nightly-2020-09-09-003430

# cat /etc/kubernetes/manifests/sleep-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: busybox
spec:
  containers:
  - name: busybox
    image: gcr.io/google_containers/busybox:1.24
    imagePullPolicy: IfNotPresent
    command: ['sh', '-c', 'sleep 3600' ]
  terminationGracePeriodSeconds: 120

Comment 36 errata-xmlrpc 2020-10-27 15:54:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.6 GA Images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:4196


Note You need to log in before you can comment on or make changes to this bug.