Docker Version: Client: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-11.el7.x86_64 Go version: go1.7.4 Git commit: 96d83a5/1.12.6 Built: Thu Feb 23 11:52:33 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-11.el7.x86_64 Go version: go1.7.4 Git commit: 96d83a5/1.12.6 Built: Thu Feb 23 11:52:33 2017 OS/Arch: linux/amd64 OpenShift Client Version: oc v3.5.5.10 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO Description of problem: During an upgrade from 3.5.x to 3.6, openshift-ansible hung during a node drain operation. The controller logs indicated that there were no pod slots available on the nodes. A large number of seemingly undeletable pods have somehow built up on this cluster: ops-health-monitoring build-03271831z-5z-2-g0cxj 0/1 ImagePullBackOff 1 10d ops-health-monitoring build-03271831z-zi-1-7jp4h 0/1 Terminating 1 10d ops-health-monitoring build-03271831z-zi-1-q7g06 0/1 Pending 0 23h ops-health-monitoring build-04122031z-3z-2-1rgn4 0/1 Terminating 0 1d ops-health-monitoring build-04122031z-3z-2-g7p0w 0/1 Pending 0 23h ops-health-monitoring pull-02272040z-ua-1-kcc39 1/1 Terminating 2 10d ops-health-monitoring pull-03141740z-12-1-0fzmh 1/1 Terminating 2 10d ops-health-monitoring pull-03141740z-t9-1-wknzv 1/1 Terminating 2 10d ops-health-monitoring pull-03141740z-zg-1-mtqw1 0/1 Terminating 0 1d ops-health-monitoring pull-03142240z-8k-1-608ml 0/1 Terminating 0 1d ops-health-monitoring pull-03142240z-ko-1-mh038 1/1 Terminating 2 10d ops-health-monitoring pull-03142240z-ns-1-6z288 0/1 Terminating 0 1d ops-health-monitoring pull-03142240z-ns-1-v2htt 0/1 Terminating 0 1d ops-health-monitoring pull-03151940z-ah-1-bf7bx 1/1 Terminating 2 10d ops-health-monitoring pull-03152240z-mg-1-8whs9 1/1 Terminating 2 10d ops-health-monitoring pull-04032250z-k3-1-kwkf8 1/1 Terminating 2 10d ops-health-monitoring pull-04071340z-0u-1-1dw8p 1/1 Terminating 2 10d ops-health-monitoring pull-04080230z-bp-1-r3dg5 1/1 Terminating 2 10d ops-health-monitoring pull-04271820z-yx-1-lzkjl 0/1 Terminating 0 1d ops-health-monitoring pull-04271830z-h7-1-l70tl 1/1 Terminating 2 10d ops-health-monitoring pull-05012140z-g1-1-spscg 1/1 Terminating 1 10d ops-health-monitoring pull-05082330z-kz-1-57fqg 0/1 Terminating 0 1d ops-health-monitoring pull-05082330z-kz-1-m50w9 1/1 Terminating 0 3d ops-health-monitoring pull-05101530z-j8-1-xvg0c 0/1 Terminating 0 2d ops-health-monitoring pull-05101540z-00-1-t9gpv 0/1 Terminating 0 1d ops-health-monitoring pull-05101540z-l6-1-n1dnq 0/1 Terminating 0 1d ops-health-monitoring pull-05101540z-l6-1-q0nhh 0/1 Terminating 0 1d ops-health-monitoring pull-05101550z-2z-1-bj7tp 0/1 Terminating 0 1d ops-health-monitoring pull-05101550z-8v-1-5qpg7 0/1 Terminating 0 1d ops-health-monitoring pull-05101550z-eq-1-1xk45 0/1 Terminating 0 1d ops-health-monitoring pull-05101600z-g3-1-mhjsv 0/1 Terminating 0 1d ops-health-monitoring pull-05101600z-ll-1-9k9r9 0/1 Terminating 0 1d ops-health-monitoring pull-05101600z-tq-1-3s1fs 0/1 Terminating 0 1d ops-health-monitoring pull-05101610z-6y-1-9nsv2 0/1 Terminating 0 1d ops-health-monitoring pull-05101610z-gl-1-34grt 0/1 Terminating 0 1d ops-health-monitoring pull-05101610z-x8-1-x9wc3 0/1 Terminating 0 1d ops-health-monitoring pull-05101620z-oh-1-gj021 0/1 Terminating 0 1d ops-health-monitoring pull-05101620z-rt-1-83j1v 0/1 Terminating 0 1d ops-health-monitoring pull-05101620z-zs-1-khcks 0/1 Terminating 0 1d ops-health-monitoring pull-05101630z-5f-1-kk41l 0/1 Terminating 0 1d ops-health-monitoring pull-05101630z-77-1-bs3nq 0/1 Terminating 0 1d ops-health-monitoring pull-05101630z-vh-1-69m3l 0/1 Terminating 0 1d ops-health-monitoring pull-05101640z-90-1-x1m96 0/1 Terminating 0 1d ops-health-monitoring pull-05101640z-9p-1-zgn1l 0/1 Terminating 0 1d ops-health-monitoring pull-05101640z-s9-1-04gnp 0/1 Terminating 0 1d ops-health-monitoring pull-05101650z-d8-1-0qqch 0/1 Terminating 0 1d ops-health-monitoring pull-05101650z-i6-1-gs84k 0/1 Terminating 0 1d ops-health-monitoring pull-05101650z-id-1-gwhk6 0/1 Terminating 0 1d ops-health-monitoring pull-05101700z-46-1-s85vk 0/1 Terminating 0 1d ops-health-monitoring pull-05101700z-9i-1-s8n45 0/1 Terminating 0 1d ops-health-monitoring pull-05101700z-jn-1-w67z1 0/1 Terminating 0 1d ops-health-monitoring pull-05101710z-4l-1-fgcqv 0/1 Terminating 0 1d ops-health-monitoring pull-05101710z-id-1-l8046 0/1 Terminating 0 1d ops-health-monitoring pull-05101710z-kh-1-h27h5 0/1 Terminating 0 1d ops-health-monitoring pull-05101720z-fc-1-7f7jt 0/1 Terminating 0 1d ops-health-monitoring pull-05101720z-q0-1-w6bsd 0/1 Terminating 0 1d ops-health-monitoring pull-05101720z-yw-1-tx318 0/1 Terminating 0 1d ops-health-monitoring pull-05101730z-f8-1-d0lxz 0/1 Terminating 0 1d ops-health-monitoring pull-05101730z-ng-1-dsxr6 0/1 Terminating 0 1d ops-health-monitoring pull-05101730z-yp-1-twhrv 0/1 Terminating 0 1d ops-health-monitoring pull-05101740z-3x-1-6795j 0/1 Terminating 0 1d ops-health-monitoring pull-05101740z-iq-1-900lh 0/1 Terminating 0 1d ops-health-monitoring pull-05101740z-t2-1-17klb 0/1 Terminating 0 1d ops-health-monitoring pull-05101750z-gz-1-clc7l 0/1 Terminating 0 1d ops-health-monitoring pull-05101750z-lw-1-hcmq3 0/1 Terminating 0 1d ops-health-monitoring pull-05101750z-zd-1-p8mmc 0/1 Terminating 0 1d ops-health-monitoring pull-05101800z-a2-1-wqq3w 0/1 Terminating 0 1d ops-health-monitoring pull-05101800z-m3-1-g9wfp 0/1 Terminating 0 1d ops-health-monitoring pull-05101800z-tg-1-r9rw2 0/1 Terminating 0 1d ops-health-monitoring pull-05101810z-8s-1-ngb2n 0/1 Terminating 0 1d ops-health-monitoring pull-05101810z-ms-1-9zlmn 0/1 Terminating 0 1d ops-health-monitoring pull-05101810z-vw-1-35hpk 0/1 Terminating 0 1d ops-health-monitoring pull-05101820z-vc-1-t979m 0/1 Terminating 0 1d ops-health-monitoring pull-05101820z-vp-1-qzrfg 0/1 Terminating 0 1d ops-health-monitoring pull-05101820z-zi-1-zpnnh 0/1 Terminating 0 1d ops-health-monitoring pull-05101830z-m4-1-mmt3z 0/1 Terminating 0 1d ops-health-monitoring pull-05101830z-u7-1-bm105 0/1 Terminating 0 1d ops-health-monitoring pull-05101830z-ur-1-74hg0 0/1 Terminating 0 1d ops-health-monitoring pull-05101840z-08-1-brfzg 0/1 Terminating 0 1d ops-health-monitoring pull-05101840z-jm-1-fz704 0/1 Terminating 0 1d ops-health-monitoring pull-05101840z-tw-1-jc7dl 0/1 Terminating 0 1d ops-health-monitoring pull-05101850z-j9-1-hlv6g 0/1 Terminating 0 1d ops-health-monitoring pull-05101850z-pg-1-sgrgz 0/1 Terminating 0 1d ops-health-monitoring pull-05101850z-wc-1-7zhb1 0/1 Terminating 0 1d ops-health-monitoring pull-05101900z-0g-1-9k342 0/1 Terminating 0 1d ops-health-monitoring pull-05101900z-6j-1-9v60s 0/1 Terminating 0 1d ops-health-monitoring pull-05101900z-qo-1-r8pbq 0/1 Terminating 0 1d ops-health-monitoring pull-05101910z-ax-1-wgbq1 0/1 Terminating 0 1d ops-health-monitoring pull-05101910z-tk-1-901m9 0/1 Terminating 0 1d ops-health-monitoring pull-05101910z-w5-1-86r08 0/1 Terminating 0 1d ops-health-monitoring pull-05101920z-2x-1-d8rvp 0/1 Terminating 0 1d ops-health-monitoring pull-05101920z-xv-1-65qhm 0/1 Terminating 0 1d ops-health-monitoring pull-05101920z-ya-1-shq8k 0/1 Terminating 0 1d ops-health-monitoring pull-05101930z-0x-1-bbv1f 0/1 Terminating 0 1d ops-health-monitoring pull-05101930z-9k-1-0rr5w 0/1 Terminating 0 1d ops-health-monitoring pull-05101930z-rd-1-cbxr6 0/1 Terminating 0 1d ops-health-monitoring pull-05101940z-1n-1-mw2ff 0/1 Terminating 0 1d ops-health-monitoring pull-05101940z-sn-1-drj4w 0/1 Terminating 0 1d ops-health-monitoring pull-05101940z-v2-1-wm30d 0/1 Terminating 0 1d ops-health-monitoring pull-05101950z-6j-1-shhwd 0/1 Terminating 0 1d ops-health-monitoring pull-05101950z-gh-1-6ss2c 0/1 Terminating 0 1d ops-health-monitoring pull-05101950z-xb-1-6nq2n 0/1 Terminating 0 1d ops-health-monitoring pull-05102000z-je-1-pw37f 0/1 Terminating 0 1d ops-health-monitoring pull-05102000z-sw-1-qvbd1 0/1 Terminating 0 1d ops-health-monitoring pull-05102000z-um-1-87zk0 0/1 Terminating 0 1d ops-health-monitoring pull-05102010z-9m-1-lwznf 0/1 Terminating 0 1d ops-health-monitoring pull-05102010z-pj-1-3qrdt 0/1 Terminating 0 1d ops-health-monitoring pull-05102010z-xt-1-kvdgx 0/1 Terminating 0 1d ops-health-monitoring pull-05102020z-24-1-2mx3n 0/1 Terminating 0 1d ops-health-monitoring pull-05102020z-97-1-252p8 0/1 Terminating 0 1d ops-health-monitoring pull-05102020z-we-1-cl9kn 0/1 Terminating 0 1d ops-health-monitoring pull-05102030z-km-1-b49bl 0/1 Terminating 0 1d ops-health-monitoring pull-05102030z-mi-1-sl7mn 0/1 Terminating 0 1d ops-health-monitoring pull-05102030z-zo-1-ttdnw 0/1 Terminating 0 1d ops-health-monitoring pull-05102040z-g9-1-9jgk6 0/1 Terminating 0 1d ops-health-monitoring pull-05102040z-pi-1-xqjrr 0/1 Terminating 0 1d ops-health-monitoring pull-05102040z-rz-1-hkwcj 0/1 Terminating 0 1d ops-health-monitoring pull-05102050z-4s-1-nhv2r 0/1 Terminating 0 1d ops-health-monitoring pull-05102050z-fl-1-jxpqf 0/1 Terminating 0 1d ops-health-monitoring pull-05102050z-t8-1-42p97 0/1 Terminating 0 1d ops-health-monitoring pull-05121320z-eh-1-znmjv 0/1 Terminating 0 2h ops-health-monitoring pull-05121320z-oc-1-zgq03 0/1 Terminating 0 2h ops-health-monitoring pull-05121320z-rj-1-wr7v9 0/1 Terminating 0 2h ops-health-monitoring pull-05121330z-k4-1-86rll 0/1 Terminating 0 2h ops-health-monitoring pull-05121330z-pd-1-8kwdh 0/1 Terminating 0 2h ops-health-monitoring pull-05121330z-yd-1-jbvnt 0/1 Terminating 0 2h ops-health-monitoring pull-05121340z-d2-1-sgzxw 0/1 Terminating 0 2h ops-health-monitoring pull-05121340z-la-1-x1mn1 0/1 Terminating 0 1h ops-health-monitoring pull-05121340z-qi-1-qvntd 0/1 Terminating 0 1h ops-health-monitoring pull-05121350z-96-1-tm5dj 0/1 Terminating 0 1h ops-health-monitoring pull-05121350z-k6-1-g3fz6 0/1 Terminating 0 1h ops-health-monitoring pull-05121350z-mq-1-r81x0 0/1 Terminating 0 1h ops-health-monitoring pull-05121400z-ii-1-gx19s 0/1 Terminating 0 1h ops-health-monitoring pull-05121400z-ov-1-0qlnv 0/1 Terminating 0 1h ops-health-monitoring pull-05121400z-tk-1-t8sxt 0/1 Terminating 0 1h ops-health-monitoring pull-05121410z-p1-1-6qvm2 0/1 Terminating 0 1h ops-health-monitoring pull-05121410z-rp-1-z6bvg 0/1 Terminating 0 1h ops-health-monitoring pull-05121410z-ve-1-11xwc 0/1 Terminating 0 1h ops-health-monitoring pull-05121420z-2m-1-hhxqc 0/1 Terminating 0 1h ops-health-monitoring pull-05121420z-dw-1-8pjp0 0/1 Terminating 0 1h ops-health-monitoring pull-05121420z-ke-1-nvz0c 0/1 Terminating 0 1h ops-health-monitoring pull-05121430z-39-1-swlsk 0/1 Terminating 0 1h ops-health-monitoring pull-05121430z-bf-1-35g0w 0/1 Terminating 0 1h ops-health-monitoring pull-05121430z-cm-1-xx5fd 0/1 Terminating 0 1h ops-health-monitoring pull-05121440z-4j-1-xkgk0 0/1 Terminating 0 59m ops-health-monitoring pull-05121440z-cb-1-lqzsp 0/1 Terminating 0 1h ops-health-monitoring pull-05121440z-jr-1-rt0vb 0/1 Terminating 0 59m ops-health-monitoring pull-05121450z-k4-1-5nhqs 0/1 Terminating 0 50m ops-health-monitoring pull-05121450z-pz-1-l4shg 0/1 Terminating 0 49m ops-health-monitoring pull-05121450z-sj-1-68djg 0/1 Terminating 0 49m ops-health-monitoring pull-05121500z-9d-1-55pn6 0/1 Terminating 0 39m ops-health-monitoring pull-05121500z-aa-1-w6r63 0/1 Terminating 0 40m ops-health-monitoring pull-05121500z-ic-1-dvxfg 0/1 Terminating 0 39m ops-health-monitoring pull-05121510z-nd-1-652tw 0/1 Terminating 0 29m ops-health-monitoring pull-05121510z-wc-1-69zgg 0/1 Terminating 0 29m ops-health-monitoring pull-05121510z-yv-1-sbkgj 0/1 Terminating 0 29m ops-health-monitoring pull-05121520z-0k-1-881sc 0/1 Terminating 0 20m ops-health-monitoring pull-05121520z-9f-1-d1qgj 0/1 Terminating 0 19m ops-health-monitoring pull-05121520z-m1-1-94dh5 0/1 Terminating 0 19m ops-health-monitoring pull-05121530z-5i-1-5321v 0/1 Terminating 0 9m ops-health-monitoring pull-05121530z-pt-1-twt1g 0/1 Terminating 0 9m ops-health-monitoring pull-05121530z-ry-1-cdjd0 0/1 Terminating 0 9m ops-health-monitoring pull-05121540z-bl-1-deploy 1/1 Running 0 5s ops-health-monitoring pull-05121540z-bl-1-pjswn 0/1 Pending 0 2s ops-health-monitoring pull-05121540z-t0-1-deploy 0/1 ContainerCreating 0 4s For some reason, these pods cannot be deleted. [master]# oc project ops-health-monitoring Now using project "ops-health-monitoring" on server "https://ip-XXX.ec2.internal:443". [master]# oc delete --force pod pull-05121530z-pt-1-twt1g pod "pull-05121530z-pt-1-twt1g" deleted [master]# oc get pods | grep pull-05121530z-pt-1-twt1g pull-05121530z-pt-1-twt1g 0/1 Terminating 0 12m We've deleted the DC's associated with these pods, restarted atomic-openshift-controllers and the kubelet's, but nothing seems to clear this condition.
Ignore the clone of/depends on 1449277 -- these issues are apparently unrelated.
Example undeletable pod: [root@dev-preview-stg-master-defb2 ~]# oc describe pod pull-05121420z-dw-1-8pjp0 Name: pull-05121420z-dw-1-8pjp0 Namespace: ops-health-monitoring Security Policy: restricted Node: ip-172-31-9-166.ec2.internal/ Labels: app=pull-05121420z-dw deployment=pull-05121420z-dw-1 deploymentconfig=pull-05121420z-dw Annotations: kubernetes.io/created-by={"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"ops-health-monitoring","name":"pull-05121420z-dw-1","uid":"1a5fecfb-37... openshift.io/deployment-config.latest-version=1 openshift.io/deployment-config.name=pull-05121420z-dw openshift.io/deployment.name=pull-05121420z-dw-1 openshift.io/generated-by=OpenShiftNewApp openshift.io/scc=restricted Status: Terminating (expires Fri, 12 May 2017 14:25:45 +0000) Termination Grace Period: 30s IP: Controllers: ReplicationController/pull-05121420z-dw-1 Containers: pull-05121420z-dw: Image: openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11 Ports: 8080/TCP, 8888/TCP Environment: <none> Mounts: /var/run/secrets/kubernetes.io/serviceaccount from default-token-ds8ay (ro) Conditions: Type Status PodScheduled True Volumes: default-token-ds8ay: Type: Secret (a volume populated by a Secret) SecretName: default-token-ds8ay Optional: false QoS Class: BestEffort Node-Selectors: type=compute Tolerations: <none> Events: FirstSeen LastSeen Count From SubObjectPath Type Reason Message --------- -------- ----- ---- ------------- -------- ------ ------- 1h 1h 1 default-scheduler Normal Scheduled Successfully assigned pull-05121420z-dw-1-8pjp0 to ip-172-31-9-166.ec2.internal 1h 1h 1 kubelet, ip-172-31-9-166.ec2.internal spec.containers{pull-05121420z-dw} Normal Pulling pulling image "openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11" 1h 1h 1 kubelet, ip-172-31-9-166.ec2.internal spec.containers{pull-05121420z-dw} Normal Pulled Successfully pulled image "openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11" 1h 1h 1 kubelet, ip-172-31-9-166.ec2.internal spec.containers{pull-05121420z-dw} Normal Created Created container with docker id 4f6ed2bdc6f2; Security:[seccomp=unconfined] 1h 1h 1 kubelet, ip-172-31-9-166.ec2.internal spec.containers{pull-05121420z-dw} Normal Started Started container with docker id 4f6ed2bdc6f2 1h 1h 1 kubelet, ip-172-31-9-166.ec2.internal spec.containers{pull-05121420z-dw} Normal Killing Killing container with docker id 4f6ed2bdc6f2: Need to kill pod. [root@dev-preview-stg-master-defb2 ~]# oc get pod pull-05121420z-dw-1-8pjp0 -o=yaml apiVersion: v1 kind: Pod metadata: annotations: kubernetes.io/created-by: | {"kind":"SerializedReference","apiVersion":"v1","reference":{"kind":"ReplicationController","namespace":"ops-health-monitoring","name":"pull-05121420z-dw-1","uid":"1a5fecfb-371e-11e7-8d75-0eaa067b1713","apiVersion":"v1","resourceVersion":"113712485"}} openshift.io/deployment-config.latest-version: "1" openshift.io/deployment-config.name: pull-05121420z-dw openshift.io/deployment.name: pull-05121420z-dw-1 openshift.io/generated-by: OpenShiftNewApp openshift.io/scc: restricted creationTimestamp: 2017-05-12T14:20:15Z deletionGracePeriodSeconds: 30 deletionTimestamp: 2017-05-12T14:25:45Z generateName: pull-05121420z-dw-1- labels: app: pull-05121420z-dw deployment: pull-05121420z-dw-1 deploymentconfig: pull-05121420z-dw name: pull-05121420z-dw-1-8pjp0 namespace: ops-health-monitoring ownerReferences: - apiVersion: v1 blockOwnerDeletion: true controller: true kind: ReplicationController name: pull-05121420z-dw-1 uid: 1a5fecfb-371e-11e7-8d75-0eaa067b1713 resourceVersion: "113713858" selfLink: /api/v1/namespaces/ops-health-monitoring/pods/pull-05121420z-dw-1-8pjp0 uid: 1e6994b4-371e-11e7-8d75-0eaa067b1713 spec: containers: - image: openshift/hello-openshift@sha256:7ce9d7b0c83a3abef41e0db590c5aa39fb05793315c60fd907f2c609997caf11 imagePullPolicy: Always name: pull-05121420z-dw ports: - containerPort: 8080 protocol: TCP - containerPort: 8888 protocol: TCP resources: {} securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID - SYS_CHROOT privileged: false runAsUser: 1056510000 seLinuxOptions: level: s0:c238,c52 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-ds8ay readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-ji2kp nodeName: ip-172-31-9-166.ec2.internal nodeSelector: type: compute restartPolicy: Always schedulerName: default-scheduler securityContext: fsGroup: 1056510000 seLinuxOptions: level: s0:c238,c52 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - name: default-token-ds8ay secret: defaultMode: 420 secretName: default-token-ds8ay status: conditions: - lastProbeTime: null lastTransitionTime: 2017-05-12T14:20:15Z status: "True" type: PodScheduled phase: Pending qosClass: BestEffort
Quirky info that others likely understand. Setting blockOwnerDeletion false like so: ownerReferences: - apiVersion: v1 blockOwnerDeletion: false controller: true kind: ReplicationController name: pull-05121520z-0k-1 uid: 7c068bb1-3726-11e7-8d75-0eaa067b1713 Will not cause the pod to be cleaned up. Removing it entirely like so: ownerReferences: - apiVersion: v1 controller: true kind: ReplicationController name: pull-05121520z-0k-1 uid: 7c068bb1-3726-11e7-8d75-0eaa067b1713 Will cause the pod to be cleaned up.
This upstream PR I *think* should stop the problem from impacting our kubelet. See: https://github.com/kubernetes/kubernetes/pull/45747
Just for reference I ran the following script on dev-preview-stg #!/usr/bin/env python import json import subprocess oc = subprocess.Popen(['oc', 'get', 'pod', '--all-namespaces', '-o', 'json'], stdout=subprocess.PIPE) stdout = oc.communicate()[0] oc.wait() def deleteOwnerReferences(pod, ns): print("Deleting reference for: %s %s" % (pod, ns)) patch = subprocess.Popen(['oc', 'patch', '-n', ns, 'pod', pod, '-p', '{"metadata":{"ownerReferences":null}}']) patch.wait() def clearPods(pods): for pod,ns in pods: deleteOwnerReferences(pod, ns) terminating = [] monitoring_terminating = [] pods = json.loads(stdout) for pod in pods["items"]: if "deletionTimestamp" in pod["metadata"]: name = pod["metadata"]["name"] ns = pod["metadata"]["namespace"] if ns == "ops-health-monitoring": monitoring_terminating.append((name,ns)) else: terminating.append((name,ns)) # Save 10 pods for later evaluation monitoring_terminating.sort(key=lambda tup: tup[0]) monitoring_terminating = monitoring_terminating[10:] clearPods(monitoring_terminating) #clearPods(terminating) print(len(monitoring_terminating)) #print(len(terminating)) Which cleans up most of the terminating pods (I intentionally left 10 pods in the ops-health-monitoring so I had things to debug, it seems to be creating more at a rapid rate). We still had about 20 other pods stuck terminating. Those are for a different reason. See: https://bugzilla.redhat.com/show_bug.cgi?id=1450554 for a BZ about at least some of those of stuck terminating pods.
This specific bug ONLY affects 3.5 nodes and 3.6 masters.
Reproduce step: 1. install ocp-3.5 with dedicated nodes. 2. create applications. oc new-app cakephp-mysql-example 3. Enable OCP repos include openshift-3.6.74 4. upgrade upgrade_control_plane.yml. 5. upgrade nodes. The upgrade_nodes.yml playbook hang and there are terminal pods When upgrade to openshift-3.6.101. the upgrade succeed without this issue. so move bug to verified. # oc version oc v3.6.101 kubernetes v1.6.1+5115d708d7 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://openshift-225.lab.eng.nay.redhat.com:8443 openshift v3.6.101 kubernetes v1.6.1+5115d708d7
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1716