Bug 1277101 - Pods are not gracefully terminated when their project is deleted
Pods are not gracefully terminated when their project is deleted
Status: CLOSED WORKSFORME
Product: OpenShift Container Platform
Classification: Red Hat
Component: Kubernetes (Show other bugs)
3.1.0
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: Seth Jennings
Jianwei Hou
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-11-02 06:11 EST by Jianwei Hou
Modified: 2017-05-16 12:54 EDT (History)
8 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-05-16 12:54:57 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jianwei Hou 2015-11-02 06:11:54 EST
Description of problem:
Given a project has several pods, when the project is deleted, the pods are killed directly even if they have grace termination period.

Version-Release number of selected component (if applicable):
[root@openshift-161 ~]# atomic-enterprise  version
atomic-enterprise v3.0.2.905
kubernetes v1.2.0-alpha.1-1107-g4c8e6f4
etcd 2.1.2

How reproducible:
Always

Steps to Reproduce:
1. Create a project
2. Create pods with different grace termination period with template: https://github.com/openshift-qe/v3-testfiles/tree/master/pods/graceful-delete
3. On node, monitor the log: journalctl -f -u atomic-openshift-node
4. Delete the project

Actual results:
After step 4, all pods are deleted directly, namespace is deleted, there is no terminationGracePeriod for these pods

Logs:
```
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.342856    7238 manager.go:1431] Killing container "034f5aeb19b72954cf51da71e5e4a267ba56cdab178b2dab660287aa9b76b299 sleep jhou/grace-default" with 0 second grace period
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.424165    7238 kubelet.go:1619] volume "9f8cc04e-8109-11e5-aa76-fa163e37f005/registry-volume", still has a container running "9f8cc04e-8109-11e5-aa76-fa163e37f005", skipping teardown
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.424214    7238 kubelet.go:1619] volume "97c89790-812f-11e5-aa76-fa163e37f005/wewangaep2", still has a container running "97c89790-812f-11e5-aa76-fa163e37f005", skipping teardown
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.527007    7238 kubelet.go:2048] SyncLoop (REMOVE, "api"): "grace-default_jhou"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.527110    7238 kubelet.go:1860] Killing unwanted pod "grace-default"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.531339    7238 manager.go:1431] Killing container "034f5aeb19b72954cf51da71e5e4a267ba56cdab178b2dab660287aa9b76b299 /" with 30 second grace period
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.612002    7238 kubelet.go:2045] SyncLoop (UPDATE, "api"): "grace0_jhou"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.612242    7238 manager.go:1431] Killing container "ba6acfb2a05b6a07d224d0699de181e4fed43669c903f7c99d3408d8862821d2 sleep jhou/grace0" with 0 second grace period
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.644144    7238 kubelet.go:2048] SyncLoop (REMOVE, "api"): "grace0_jhou"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.644291    7238 kubelet.go:1860] Killing unwanted pod "grace0"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.648457    7238 manager.go:1431] Killing container "ba6acfb2a05b6a07d224d0699de181e4fed43669c903f7c99d3408d8862821d2 /" with 0 second grace period
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.685864    7238 kubelet.go:2045] SyncLoop (UPDATE, "api"): "grace10_jhou"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.686034    7238 manager.go:1431] Killing container "989b9d29b64adcd5ea93387d2f2159ee80b0e23df62eb1f1e02476e119ff744e hello-openshift jhou/grace10" with 0 second grace period
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.721950    7238 kubelet.go:2048] SyncLoop (REMOVE, "api"): "grace10_jhou"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.722078    7238 kubelet.go:1860] Killing unwanted pod "grace10"
Nov 02 18:58:55 openshift-145.lab.eng.nay.redhat.com atomic-openshift-node[7238]: I1102 18:58:55.725060    7238 manager.go:1431] Killing container "989b9d29b64adcd5ea93387d2f2159ee80b0e23df62eb1f1e02476e119ff744e /" with 10 second grace period
```

Expected results:
When namespace is deleted, the pods in it should also have a grace termination period.

Additional info:
Comment 2 Derek Carr 2016-03-14 15:34:35 EDT
The namespace controller in kube 1.2 was updated.

https://github.com/kubernetes/kubernetes/pull/21400

This will be in the next rebase of Origin.
Comment 3 Derek Carr 2016-03-14 15:37:49 EDT
Is the aosqe/sleep image doing anything in response to sigterm?
Comment 4 Troy Dawson 2016-09-01 11:51:49 EDT
This has been merged into ose and is in OSE v3.3.0.28 or newer.
Comment 6 Jianwei Hou 2016-09-13 22:57:46 EDT
This is all my container does:
----
#!/bin/sh
trap 'sleep 3600' SIGTERM
while true; do :; done
----

Tested on 
openshift v3.3.0.30
kubernetes v1.3.0+52492b4
etcd 2.3.0+git

I have 3 pods with 10, 20, 40 second grace termination periods, when deleting the project, all pods are given a 10 second grace period, so the 20 and 40 grace periods are not respected. Just want to confirm is this our expected behavior?
Thanks.

```
Sep 13 22:45:08 host-8-172-108 atomic-openshift-node: I0913 22:45:08.065791   23774 docker_manager.go:1334] Killing container "799a9cf3d8d0c3554713bea03b82e44b5d8f74d6ae14635a868e97998c236949 hello-openshift jhou/grace20" with 10 second grace period
Sep 13 22:45:08 host-8-172-108 atomic-openshift-node: I0913 22:45:08.065943   23774 docker_manager.go:1334] Killing container "3a6b531d31e58f6d80a206933de036773588c9e45d8c4249b4f9b5033102f853 hello-openshift jhou/grace10" with 10 second grace period
Sep 13 22:45:08 host-8-172-108 atomic-openshift-node: I0913 22:45:08.067530   23774 docker_manager.go:1334] Killing container "e9a9080789b498191c0000222f29944c73f4d669a18de65513a8fca549574939 hello-openshift jhou/grace40" with 10 second grace period
Sep 13 22:45:08 host-8-172-108 docker-current: time="2016-09-13T22:45:08.068277851-04:00" level=info msg="{Action=stop, ID=e9a9080789b498191c0000222f29944c73f4d669a18de65513a8fca549574939, LoginUID=4294967295, PID=23774}"
```
Comment 7 Jianwei Hou 2016-09-17 23:13:05 EDT
Also on v3.3.0.31, all containers are killed with 10 seconds grace period when the project is deleted. Assign this back to see if this is intended.
Comment 10 Seth Jennings 2016-10-05 12:32:46 EDT
I was unable to recreate this.

# oc version
oc v3.3.0.34
kubernetes v1.3.0+52492b4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://192.168.12.18:8443
openshift v3.3.0.34
kubernetes v1.3.0+52492b4

# cat pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: demo-pod
spec:
  containers:
  - image: busybox
    name: busybox
    imagePullPolicy: Always
    command:
    - /bin/sleep
    - "3600"
  terminationGracePeriodSeconds: 20

# oc new-project demo
# oc create -f pod.yaml
# oc delete project/demo

Oct 05 12:26:31 rhel72.cloud.variantweb.net atomic-openshift-node[4403]: I1005 12:26:31.157992    4403 docker_manager.go:1334] Killing container "6bfdfdcece81f21b9674a5252662493d5fdf2956f25c7965ac19ac560bbafb03 busybox demo/demo-pod" with 20 second grace period

Oct 05 12:26:51 rhel72.cloud.variantweb.net atomic-openshift-node[4403]: E1005 12:26:51.253472    4403 event.go:198] Server rejected event '&api.Event{TypeMeta:unversioned.TypeMeta{Kind:"", APIVersion:""}, ObjectMeta:api.ObjectMeta{Name:"demo-pod.147aaf846ecbaeb2", GenerateName:"", Namespace:"demo", SelfLink:"", UID:"", ResourceVersion:"", Generation:0, CreationTimestamp:unversioned.Time{Time:time.Time{sec:0, nsec:0, loc:(*time.Location)(nil)}}, DeletionTimestamp:(*unversioned.Time)(nil), DeletionGracePeriodSeconds:(*int64)(nil), Labels:map[string]string(nil), Annotations:map[string]string(nil), OwnerReferences:[]api.OwnerReference(nil), Finalizers:[]string(nil)}, InvolvedObject:api.ObjectReference{Kind:"Pod", Namespace:"demo", Name:"demo-pod", UID:"6204b1b8-8b18-11e6-98ca-fa163eebc035", APIVersion:"v1", ResourceVersion:"546", FieldPath:"spec.containers{busybox}"}, Reason:"Killing", Message:"Killing container with docker id 6bfdfdcece81: Need to kill pod.", Source:api.EventSource{Component:"kubelet", Host:"localhost"}, FirstTimestamp:unversioned.Time{Time:time.Time{sec:63611281611, nsec:246644914, loc:(*time.Location)(0x858ad00)}}, LastTimestamp:unversioned.Time{Time:time.Time{sec:63611281611, nsec:246644914, loc:(*time.Location)(0x858ad00)}}, Count:1, Type:"Normal"}': 'events "demo-pod.147aaf846ecbaeb2" is forbidden: Unable to create new content in namespace demo because it is being terminated.' (will not retry!)

Oct 05 12:26:52 rhel72.cloud.variantweb.net atomic-openshift-node[4403]: I1005 12:26:52.316591    4403 docker_manager.go:1334] Killing container "6bfdfdcece81f21b9674a5252662493d5fdf2956f25c7965ac19ac560bbafb03 busybox demo/demo-pod" with 0 second grace period
Comment 12 Seth Jennings 2017-05-16 12:54:57 EDT
I was unable to recreate this.  It is now relatively ancient.  Closing.  Please reopen if recreation is possible on current release.

Note You need to log in before you can comment on or make changes to this bug.