Bug 1450461

Summary: Unable to delete pods stuck in terminating state
Product: OpenShift Container Platform Reporter: Justin Pierce <jupierce>
Component: NodeAssignee: Derek Carr <decarr>
Status: CLOSED ERRATA QA Contact: DeShuai Ma <dma>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.6.0CC: anli, aos-bugs, decarr, dma, eparis, hgomes, jhonce, jokerman, jupierce, mkargaki, mmccomas, mwoodson, pweil, qcai, rpuccini, sjenning, smunilla
Target Milestone: ---Keywords: Reopened
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1449277 Environment:
Last Closed: 2018-04-09 21:13:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Justin Pierce 2017-05-12 15:54:49 UTC
Docker Version:
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-11.el7.x86_64
 Go version:      go1.7.4
 Git commit:      96d83a5/1.12.6
 Built:           Thu Feb 23 11:52:33 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-11.el7.x86_64
 Go version:      go1.7.4
 Git commit:      96d83a5/1.12.6
 Built:           Thu Feb 23 11:52:33 2017
 OS/Arch:         linux/amd64


OpenShift Client Version:
oc v3.5.5.10
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO



Description of problem:

During an upgrade from 3.5.x to 3.6, openshift-ansible hung during a node drain operation. The controller logs indicated that there were no pod slots available on the nodes. 

A large number of seemingly undeletable pods have somehow built up on this cluster:

ops-health-monitoring                              build-03271831z-5z-2-g0cxj            0/1       ImagePullBackOff    1          10d
ops-health-monitoring                              build-03271831z-zi-1-7jp4h            0/1       Terminating         1          10d
ops-health-monitoring                              build-03271831z-zi-1-q7g06            0/1       Pending             0          23h
ops-health-monitoring                              build-04122031z-3z-2-1rgn4            0/1       Terminating         0          1d
ops-health-monitoring                              build-04122031z-3z-2-g7p0w            0/1       Pending             0          23h
ops-health-monitoring                              pull-02272040z-ua-1-kcc39             1/1       Terminating         2          10d
ops-health-monitoring                              pull-03141740z-12-1-0fzmh             1/1       Terminating         2          10d
ops-health-monitoring                              pull-03141740z-t9-1-wknzv             1/1       Terminating         2          10d
ops-health-monitoring                              pull-03141740z-zg-1-mtqw1             0/1       Terminating         0          1d
ops-health-monitoring                              pull-03142240z-8k-1-608ml             0/1       Terminating         0          1d
ops-health-monitoring                              pull-03142240z-ko-1-mh038             1/1       Terminating         2          10d
ops-health-monitoring                              pull-03142240z-ns-1-6z288             0/1       Terminating         0          1d
ops-health-monitoring                              pull-03142240z-ns-1-v2htt             0/1       Terminating         0          1d
ops-health-monitoring                              pull-03151940z-ah-1-bf7bx             1/1       Terminating         2          10d
ops-health-monitoring                              pull-03152240z-mg-1-8whs9             1/1       Terminating         2          10d
ops-health-monitoring                              pull-04032250z-k3-1-kwkf8             1/1       Terminating         2          10d
ops-health-monitoring                              pull-04071340z-0u-1-1dw8p             1/1       Terminating         2          10d
ops-health-monitoring                              pull-04080230z-bp-1-r3dg5             1/1       Terminating         2          10d
ops-health-monitoring                              pull-04271820z-yx-1-lzkjl             0/1       Terminating         0          1d
ops-health-monitoring                              pull-04271830z-h7-1-l70tl             1/1       Terminating         2          10d
ops-health-monitoring                              pull-05012140z-g1-1-spscg             1/1       Terminating         1          10d
ops-health-monitoring                              pull-05082330z-kz-1-57fqg             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05082330z-kz-1-m50w9             1/1       Terminating         0          3d
ops-health-monitoring                              pull-05101530z-j8-1-xvg0c             0/1       Terminating         0          2d
ops-health-monitoring                              pull-05101540z-00-1-t9gpv             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101540z-l6-1-n1dnq             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101540z-l6-1-q0nhh             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101550z-2z-1-bj7tp             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101550z-8v-1-5qpg7             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101550z-eq-1-1xk45             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101600z-g3-1-mhjsv             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101600z-ll-1-9k9r9             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101600z-tq-1-3s1fs             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101610z-6y-1-9nsv2             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101610z-gl-1-34grt             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101610z-x8-1-x9wc3             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101620z-oh-1-gj021             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101620z-rt-1-83j1v             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101620z-zs-1-khcks             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101630z-5f-1-kk41l             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101630z-77-1-bs3nq             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101630z-vh-1-69m3l             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101640z-90-1-x1m96             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101640z-9p-1-zgn1l             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101640z-s9-1-04gnp             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101650z-d8-1-0qqch             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101650z-i6-1-gs84k             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101650z-id-1-gwhk6             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101700z-46-1-s85vk             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101700z-9i-1-s8n45             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101700z-jn-1-w67z1             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101710z-4l-1-fgcqv             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101710z-id-1-l8046             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101710z-kh-1-h27h5             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101720z-fc-1-7f7jt             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101720z-q0-1-w6bsd             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101720z-yw-1-tx318             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101730z-f8-1-d0lxz             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101730z-ng-1-dsxr6             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101730z-yp-1-twhrv             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101740z-3x-1-6795j             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101740z-iq-1-900lh             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101740z-t2-1-17klb             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101750z-gz-1-clc7l             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101750z-lw-1-hcmq3             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101750z-zd-1-p8mmc             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101800z-a2-1-wqq3w             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101800z-m3-1-g9wfp             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101800z-tg-1-r9rw2             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101810z-8s-1-ngb2n             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101810z-ms-1-9zlmn             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101810z-vw-1-35hpk             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101820z-vc-1-t979m             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101820z-vp-1-qzrfg             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101820z-zi-1-zpnnh             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101830z-m4-1-mmt3z             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101830z-u7-1-bm105             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101830z-ur-1-74hg0             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101840z-08-1-brfzg             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101840z-jm-1-fz704             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101840z-tw-1-jc7dl             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101850z-j9-1-hlv6g             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101850z-pg-1-sgrgz             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101850z-wc-1-7zhb1             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101900z-0g-1-9k342             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101900z-6j-1-9v60s             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101900z-qo-1-r8pbq             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101910z-ax-1-wgbq1             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101910z-tk-1-901m9             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101910z-w5-1-86r08             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101920z-2x-1-d8rvp             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101920z-xv-1-65qhm             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101920z-ya-1-shq8k             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101930z-0x-1-bbv1f             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101930z-9k-1-0rr5w             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101930z-rd-1-cbxr6             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101940z-1n-1-mw2ff             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101940z-sn-1-drj4w             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101940z-v2-1-wm30d             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101950z-6j-1-shhwd             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101950z-gh-1-6ss2c             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05101950z-xb-1-6nq2n             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102000z-je-1-pw37f             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102000z-sw-1-qvbd1             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102000z-um-1-87zk0             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102010z-9m-1-lwznf             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102010z-pj-1-3qrdt             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102010z-xt-1-kvdgx             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102020z-24-1-2mx3n             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102020z-97-1-252p8             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102020z-we-1-cl9kn             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102030z-km-1-b49bl             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102030z-mi-1-sl7mn             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102030z-zo-1-ttdnw             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102040z-g9-1-9jgk6             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102040z-pi-1-xqjrr             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102040z-rz-1-hkwcj             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102050z-4s-1-nhv2r             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102050z-fl-1-jxpqf             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05102050z-t8-1-42p97             0/1       Terminating         0          1d
ops-health-monitoring                              pull-05121320z-eh-1-znmjv             0/1       Terminating         0          2h
ops-health-monitoring                              pull-05121320z-oc-1-zgq03             0/1       Terminating         0          2h
ops-health-monitoring                              pull-05121320z-rj-1-wr7v9             0/1       Terminating         0          2h
ops-health-monitoring                              pull-05121330z-k4-1-86rll             0/1       Terminating         0          2h
ops-health-monitoring                              pull-05121330z-pd-1-8kwdh             0/1       Terminating         0          2h
ops-health-monitoring                              pull-05121330z-yd-1-jbvnt             0/1       Terminating         0          2h
ops-health-monitoring                              pull-05121340z-d2-1-sgzxw             0/1       Terminating         0          2h
ops-health-monitoring                              pull-05121340z-la-1-x1mn1             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121340z-qi-1-qvntd             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121350z-96-1-tm5dj             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121350z-k6-1-g3fz6             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121350z-mq-1-r81x0             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121400z-ii-1-gx19s             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121400z-ov-1-0qlnv             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121400z-tk-1-t8sxt             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121410z-p1-1-6qvm2             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121410z-rp-1-z6bvg             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121410z-ve-1-11xwc             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121420z-2m-1-hhxqc             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121420z-dw-1-8pjp0             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121420z-ke-1-nvz0c             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121430z-39-1-swlsk             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121430z-bf-1-35g0w             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121430z-cm-1-xx5fd             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121440z-4j-1-xkgk0             0/1       Terminating         0          59m
ops-health-monitoring                              pull-05121440z-cb-1-lqzsp             0/1       Terminating         0          1h
ops-health-monitoring                              pull-05121440z-jr-1-rt0vb             0/1       Terminating         0          59m
ops-health-monitoring                              pull-05121450z-k4-1-5nhqs             0/1       Terminating         0          50m
ops-health-monitoring                              pull-05121450z-pz-1-l4shg             0/1       Terminating         0          49m
ops-health-monitoring                              pull-05121450z-sj-1-68djg             0/1       Terminating         0          49m
ops-health-monitoring                              pull-05121500z-9d-1-55pn6             0/1       Terminating         0          39m
ops-health-monitoring                              pull-05121500z-aa-1-w6r63             0/1       Terminating         0          40m
ops-health-monitoring                              pull-05121500z-ic-1-dvxfg             0/1       Terminating         0          39m
ops-health-monitoring                              pull-05121510z-nd-1-652tw             0/1       Terminating         0          29m
ops-health-monitoring                              pull-05121510z-wc-1-69zgg             0/1       Terminating         0          29m
ops-health-monitoring                              pull-05121510z-yv-1-sbkgj             0/1       Terminating         0          29m
ops-health-monitoring                              pull-05121520z-0k-1-881sc             0/1       Terminating         0          20m
ops-health-monitoring                              pull-05121520z-9f-1-d1qgj             0/1       Terminating         0          19m
ops-health-monitoring                              pull-05121520z-m1-1-94dh5             0/1       Terminating         0          19m
ops-health-monitoring                              pull-05121530z-5i-1-5321v             0/1       Terminating         0          9m
ops-health-monitoring                              pull-05121530z-pt-1-twt1g             0/1       Terminating         0          9m
ops-health-monitoring                              pull-05121530z-ry-1-cdjd0             0/1       Terminating         0          9m
ops-health-monitoring                              pull-05121540z-bl-1-deploy            1/1       Running             0          5s
ops-health-monitoring                              pull-05121540z-bl-1-pjswn             0/1       Pending             0          2s
ops-health-monitoring                              pull-05121540z-t0-1-deploy            0/1       ContainerCreating   0          4s


For some reason, these pods cannot be deleted.


[master]# oc project ops-health-monitoring 
Now using project "ops-health-monitoring" on server "https://ip-XXX.ec2.internal:443".

[master]# oc delete --force pod pull-05121530z-pt-1-twt1g
pod "pull-05121530z-pt-1-twt1g" deleted

[master]# oc get pods | grep pull-05121530z-pt-1-twt1g
pull-05121530z-pt-1-twt1g    0/1       Terminating        0          12m


We've deleted the DC's associated with these pods, restarted atomic-openshift-controllers and the kubelet's, but nothing seems to clear this condition.

Comment 1 Justin Pierce 2017-05-12 15:58:10 UTC
Ignore the clone of/depends on 1449277 -- these issues are apparently unrelated.

Comment 2 Justin Pierce 2017-05-12 16:10:06 UTC
Example undeletable pod:

[root@dev-preview-stg-master-defb2 ~]# oc describe pod pull-05121420z-dw-1-8pjp0
Name:				pull-05121420z-dw-1-8pjp0
Namespace:			ops-health-monitoring
Security Policy:		restricted
Node:				ip-172-31-9-166.ec2.internal/
Labels:				app=pull-05121420z-dw
				deployment=pull-05121420z-dw-1
				deploymentconfig=pull-05121420z-dw
Annotations:			kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"ops-health-monitoring","name":"pull-05121420z-dw-1","uid":"1a5fecfb-37...
				openshift.io/deployment-config.latest-version=1
				openshift.io/deployment-config.name=pull-05121420z-dw
				openshift.io/deployment.name=pull-05121420z-dw-1
				openshift.io/generated-by=OpenShiftNewApp
				openshift.io/scc=restricted
Status:				Terminating (expires Fri, 12 May 2017 14:25:45 +0000)
Termination Grace Period:	30s
IP:				
Controllers:			ReplicationController/pull-05121420z-dw-1
Containers:
  pull-05121420z-dw:
    Image:		openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11
    Ports:		8080/TCP, 8888/TCP
    Environment:	<none>
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from default-token-ds8ay (ro)
Conditions:
  Type		Status
  PodScheduled 	True 
Volumes:
  default-token-ds8ay:
    Type:	Secret (a volume populated by a Secret)
    SecretName:	default-token-ds8ay
    Optional:	false
QoS Class:	BestEffort
Node-Selectors:	type=compute
Tolerations:	<none>
Events:
  FirstSeen	LastSeen	Count	From					SubObjectPath				Type		Reason		Message
  ---------	--------	-----	----					-------------				--------	------		-------
  1h		1h		1	default-scheduler								Normal		Scheduled	Successfully assigned pull-05121420z-dw-1-8pjp0 to ip-172-31-9-166.ec2.internal
  1h		1h		1	kubelet, ip-172-31-9-166.ec2.internal	spec.containers{pull-05121420z-dw}	Normal		Pulling		pulling image "openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11"
  1h		1h		1	kubelet, ip-172-31-9-166.ec2.internal	spec.containers{pull-05121420z-dw}	Normal		Pulled		Successfully pulled image "openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11"
  1h		1h		1	kubelet, ip-172-31-9-166.ec2.internal	spec.containers{pull-05121420z-dw}	Normal		Created		Created container with docker id 4f6ed2bdc6f2; Security:[seccomp=unconfined]
  1h		1h		1	kubelet, ip-172-31-9-166.ec2.internal	spec.containers{pull-05121420z-dw}	Normal		Started		Started container with docker id 4f6ed2bdc6f2
  1h		1h		1	kubelet, ip-172-31-9-166.ec2.internal	spec.containers{pull-05121420z-dw}	Normal		Killing		Killing container with docker id 4f6ed2bdc6f2: Need to kill pod.


[root@dev-preview-stg-master-defb2 ~]# oc get pod pull-05121420z-dw-1-8pjp0 -o=yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/created-by: |
      {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"ops-health-monitoring","name":"pull-05121420z-dw-1","uid":"1a5fecfb-371e-11e7-8d75-0eaa067b1713","apiVersion":"v1","resourceVersion":"113712485"}}
    openshift.io/deployment-config.latest-version: "1"
    openshift.io/deployment-config.name: pull-05121420z-dw
    openshift.io/deployment.name: pull-05121420z-dw-1
    openshift.io/generated-by: OpenShiftNewApp
    openshift.io/scc: restricted
  creationTimestamp: 2017-05-12T14:20:15Z
  deletionGracePeriodSeconds: 30
  deletionTimestamp: 2017-05-12T14:25:45Z
  generateName: pull-05121420z-dw-1-
  labels:
    app: pull-05121420z-dw
    deployment: pull-05121420z-dw-1
    deploymentconfig: pull-05121420z-dw
  name: pull-05121420z-dw-1-8pjp0
  namespace: ops-health-monitoring
  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: true
    controller: true
    kind: ReplicationController
    name: pull-05121420z-dw-1
    uid: 1a5fecfb-371e-11e7-8d75-0eaa067b1713
  resourceVersion: "113713858"
  selfLink: /api/v1/namespaces/ops-health-monitoring/pods/pull-05121420z-dw-1-8pjp0
  uid: 1e6994b4-371e-11e7-8d75-0eaa067b1713
spec:
  containers:
  - image: openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11
    imagePullPolicy: Always
    name: pull-05121420z-dw
    ports:
    - containerPort: 8080
      protocol: TCP
    - containerPort: 8888
      protocol: TCP
    resources: {}
    securityContext:
      capabilities:
        drop:
        - KILL
        - MKNOD
        - SETGID
        - SETUID
        - SYS_CHROOT
      privileged: false
      runAsUser: 1056510000
      seLinuxOptions:
        level: s0:c238,c52
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-ds8ay
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-ji2kp
  nodeName: ip-172-31-9-166.ec2.internal
  nodeSelector:
    type: compute
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    fsGroup: 1056510000
    seLinuxOptions:
      level: s0:c238,c52
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - name: default-token-ds8ay
    secret:
      defaultMode: 420
      secretName: default-token-ds8ay
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-05-12T14:20:15Z
    status: "True"
    type: PodScheduled
  phase: Pending
  qosClass: BestEffort

Comment 5 Eric Paris 2017-05-12 21:04:26 UTC
Quirky info that others likely understand. Setting blockOwnerDeletion false like so:

  ownerReferences:
  - apiVersion: v1
    blockOwnerDeletion: false
    controller: true
    kind: ReplicationController
    name: pull-05121520z-0k-1
    uid: 7c068bb1-3726-11e7-8d75-0eaa067b1713

Will not cause the pod to be cleaned up. Removing it entirely like so:

  ownerReferences:
  - apiVersion: v1
    controller: true
    kind: ReplicationController
    name: pull-05121520z-0k-1
    uid: 7c068bb1-3726-11e7-8d75-0eaa067b1713

Will cause the pod to be cleaned up.

Comment 6 Derek Carr 2017-05-12 21:20:06 UTC
This upstream PR I *think* should stop the problem from impacting our kubelet.

See: https://github.com/kubernetes/kubernetes/pull/45747

Comment 7 Eric Paris 2017-05-13 01:53:46 UTC
Just for reference I ran the following script on dev-preview-stg

#!/usr/bin/env python

import json
import subprocess

oc = subprocess.Popen(['oc', 'get', 'pod', '--all-namespaces', '-o', 'json'], stdout=subprocess.PIPE)
stdout = oc.communicate()[0]
oc.wait()

def deleteOwnerReferences(pod, ns):
  print("Deleting reference for: %s %s" % (pod, ns))
  patch = subprocess.Popen(['oc', 'patch', '-n', ns, 'pod', pod, '-p', '{"metadata":{"ownerReferences":null}}'])
  patch.wait()

def clearPods(pods):
  for pod,ns in pods:
    deleteOwnerReferences(pod, ns)

terminating = []
monitoring_terminating = []
pods = json.loads(stdout)
for pod in pods["items"]:
  if "deletionTimestamp" in pod["metadata"]:
    name = pod["metadata"]["name"]
    ns = pod["metadata"]["namespace"]
    if ns == "ops-health-monitoring":
      monitoring_terminating.append((name,ns))
    else:
      terminating.append((name,ns))

# Save 10 pods for later evaluation
monitoring_terminating.sort(key=lambda tup: tup[0])
monitoring_terminating = monitoring_terminating[10:]
clearPods(monitoring_terminating)
#clearPods(terminating)
print(len(monitoring_terminating))
#print(len(terminating))



Which cleans up most of the terminating pods (I intentionally left 10 pods in the ops-health-monitoring so I had things to debug, it seems to be creating more at a rapid rate).

We still had about 20 other pods stuck terminating. Those are for a different reason. See: https://bugzilla.redhat.com/show_bug.cgi?id=1450554 for a BZ about at least some of those of stuck terminating pods.

Comment 11 Eric Paris 2017-05-16 12:47:55 UTC
This specific bug ONLY affects 3.5 nodes and 3.6 masters.

Comment 13 Anping Li 2017-06-09 11:07:58 UTC
Reproduce step:
1. install ocp-3.5 with dedicated nodes.
2. create applications. oc new-app cakephp-mysql-example
3. Enable OCP repos include openshift-3.6.74
4. upgrade upgrade_control_plane.yml.
5. upgrade nodes. 
The upgrade_nodes.yml playbook hang and there are terminal pods 


When upgrade to openshift-3.6.101. the upgrade succeed without this issue. so move bug to verified.
# oc version
oc v3.6.101
kubernetes v1.6.1+5115d708d7
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://openshift-225.lab.eng.nay.redhat.com:8443
openshift v3.6.101
kubernetes v1.6.1+5115d708d7

Comment 15 errata-xmlrpc 2017-08-10 05:24:06 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716