Bug 1455743

Summary:	'Unknown' POD did not return resource in OCP 3.5
Product:	OpenShift Container Platform	Reporter:	Min Woo Park <mpark>
Component:	Node	Assignee:	Seth Jennings <sjenning>
Status:	CLOSED ERRATA	QA Contact:	DeShuai Ma <dma>
Severity:	medium	Docs Contact:
Priority:	medium
Version:	3.5.0	CC:	aos-bugs, decarr, dma, eparis, hgomes, jokerman, jrfuller, knakayam, mbarrett, mmccomas, sjenning
Target Milestone:	---	Keywords:	Reopened
Target Release:	3.7.z
Hardware:	x86_64
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:	Resources consumed by pods whose node is unreachable or have past their termination grace period are no longer counted against quota.	Story Points:	---
Clone Of:		Environment:
Last Closed:	2018-05-18 03:54:45 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Min Woo Park 2017-05-26 01:51:37 UTC

Description of problem:

3 Pods are serviced.
When I power-off 1 node, after 5 minutes, the new pod was created on the other node. 
However the orgin pod was not removed and remained as 'Unknown' status.
This issue is occurred only in OCP 3.5.x not OCP 3.4.

Found belows:
It's working as expected, starting from k8s 1.5 / OpenShift 3.5, it won't remove unreachable pods automatically for pod safety.
https://github.com/kubernetes/kubernetes/issues/44458

Not deleting the unreachable pod was a decision made in 1.5 in the interest of providing safety guarantees. The relevant rationale doc is https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-safety.md

But this expected behavior in OCP 3.5 makes another issue.
When Quota and limintrange are set, the termination pod(Unknown pod) does not return resource. 

Version-Release number of selected component (if applicable):

# oc version
oc v3.5.5.5
kubernetes v1.5.2+43a9be4
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://pasmaster.lotte.cloud:8443
openshift v3.5.5.5
kubernetes v1.5.2+43a9be4

-----------------------------------------------------------------------------
# openshift version
openshift v3.5.5.5
kubernetes v1.5.2+43a9be4
etcd 3.1.0

-----------------------------------------------------------------------------
# docker version
Client:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

Server:
 Version:         1.12.6
 API version:     1.24
 Package version: docker-common-1.12.6-16.el7.x86_64
 Go version:      go1.7.4
 Git commit:      3a094bd/1.12.6
 Built:           Tue Mar 21 13:30:59 2017
 OS/Arch:         linux/amd64

-----------------------------------------------------------------------------
# kubectl version
Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2+43a9be4", GitCommit:"43a9be4", GitTreeState:"clean", BuildDate:"2017-04-08T04:31:22Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2+43a9be4", GitCommit:"43a9be4", GitTreeState:"clean", BuildDate:"2017-04-08T04:31:22Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"}

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 1 Derek Carr 2017-05-26 19:49:36 UTC

At this time, quota only ignores pods that have reached a terminal state.  I am apprehensive to exclude pods that have not reached a terminal state.

Comment 2 Derek Carr 2017-05-26 20:35:11 UTC

I am going to propose a PR in upstream kubernetes to account for this behavior.

If a pod is in a non-terminal state, and has a deletion stamp, and the pod.status.reason is NodeLost, quota could ignore it.

Comment 3 Derek Carr 2017-05-26 21:08:36 UTC

Opened upstream PR:
https://github.com/kubernetes/kubernetes/pull/46542

I am inclined to do something more generically that handles any pod stuck terminating scenario.  For example, quota could ignore any pod whose marked for deletion and the current observed time > grace period.  In this model, the quota system would release the quota after that interval + [quota sync interval].

Comment 4 Derek Carr 2017-05-31 15:54:07 UTC

The quota system is working as designed today by counting all pods not in a terminal state (i.e. its phase is not succeeded or failed).

I have opened a PR to try to augment the quota system to handle scenarios where a pod is stuck terminating for extenuating situations as described here.  I am not marking this as a 3.6 release blocker, but will try to get the feature enhanced in the Kubernetes 1.7+ release cycles.

Comment 5 Derek Carr 2017-06-07 14:07:12 UTC

I am moving as an RFE.

"As a user, if my pod is terminating, and has exceeded its associated grace period, I would like my quota to be released for use by other pods in the system".

Comment 6 Derek Carr 2017-06-07 14:08:11 UTC

I will continue to push https://github.com/kubernetes/kubernetes/pull/46542 and hope to get feature enhanced in k8s 1.8 time-frame.

Comment 8 Seth Jennings 2017-09-18 22:17:35 UTC

Origin PR:
https://github.com/openshift/origin/pull/16425

Comment 9 Seth Jennings 2017-10-10 02:48:10 UTC

New Origin PR:
https://github.com/openshift/origin/pull/16722

Comment 10 DeShuai Ma 2017-10-12 06:42:07 UTC

Should fix in v3.7.0-0.149.0

Comment 11 DeShuai Ma 2017-10-23 09:53:23 UTC

Try it on openshift v3.7.0-0.158.0, After stop the node wait the pod become Unknown, the quota is not released.

[root@qe-pod37-master-etcd-1 ~]# oc get po hello-pod -n dma1 -o yaml
apiVersion: v1
kind: Pod
metadata:
  annotations:
    openshift.io/scc: anyuid
  creationTimestamp: 2017-10-23T09:38:39Z
  deletionGracePeriodSeconds: 30
  deletionTimestamp: 2017-10-23T09:46:46Z
  labels:
    name: hello-pod
  name: hello-pod
  namespace: dma1
  resourceVersion: "30079"
  selfLink: /api/v1/namespaces/dma1/pods/hello-pod
  uid: f2f6cb4b-b7d5-11e7-a2c6-fa163e03968e
spec:
  containers:
  - image: docker.io/deshuai/hello-pod:latest
    imagePullPolicy: IfNotPresent
    name: hello-pod
    ports:
    - containerPort: 8080
      protocol: TCP
    resources: {}
    securityContext:
      capabilities:
        drop:
        - MKNOD
      privileged: false
      seLinuxOptions:
        level: s0:c12,c9
    terminationMessagePath: /dev/termination-log
    terminationMessagePolicy: File
    volumeMounts:
    - mountPath: /tmp
      name: tmp
    - mountPath: /var/run/secrets/kubernetes.io/serviceaccount
      name: default-token-zfj3f
      readOnly: true
  dnsPolicy: ClusterFirst
  imagePullSecrets:
  - name: default-dockercfg-x1758
  nodeName: host-8-241-39.host.centralci.eng.rdu2.redhat.com
  restartPolicy: Always
  schedulerName: default-scheduler
  securityContext:
    seLinuxOptions:
      level: s0:c12,c9
  serviceAccount: default
  serviceAccountName: default
  terminationGracePeriodSeconds: 30
  volumes:
  - emptyDir: {}
    name: tmp
  - name: default-token-zfj3f
    secret:
      defaultMode: 420
      secretName: default-token-zfj3f
status:
  conditions:
  - lastProbeTime: null
    lastTransitionTime: 2017-10-23T09:38:38Z
    status: "True"
    type: Initialized
  - lastProbeTime: null
    lastTransitionTime: 2017-10-23T09:38:40Z
    status: "False"
    type: Ready
  - lastProbeTime: null
    lastTransitionTime: 2017-10-23T09:38:39Z
    status: "True"
    type: PodScheduled
  containerStatuses:
  - containerID: docker://ce9865b053421d4de48ee803785dc56c03b00c66bb52703c0107ed30a98c742b
    image: docker.io/deshuai/hello-pod:latest
    imageID: docker-pullable://docker.io/deshuai/hello-pod@sha256:289953c559120c7d2ca92d92810885887ee45c871c373a1e492e845eca575b8c
    lastState: {}
    name: hello-pod
    ready: true
    restartCount: 0
    state:
      running:
        startedAt: 2017-10-23T09:38:39Z
  hostIP: 172.16.120.55
  message: Node host-8-241-39.host.centralci.eng.rdu2.redhat.com which was running
    pod hello-pod is unresponsive
  phase: Running
  podIP: 10.129.0.118
  qosClass: BestEffort
  reason: NodeLost
  startTime: 2017-10-23T09:38:38Z
[root@qe-pod37-master-etcd-1 ~]# oc get po -n dma1
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Unknown   0          10m
[root@qe-pod37-master-etcd-1 ~]# 
[root@qe-pod37-master-etcd-1 ~]# 
[root@qe-pod37-master-etcd-1 ~]# 
[root@qe-pod37-master-etcd-1 ~]# 
[root@qe-pod37-master-etcd-1 ~]# 
[root@qe-pod37-master-etcd-1 ~]# oc describe quota myquota -n dma1
Name:		myquota
Namespace:	dma1
Resource	Used	Hard
--------	----	----
pods		1	10
resourcequotas	1	1

Comment 12 DeShuai Ma 2017-10-23 09:54:00 UTC

Anything is wrong in my test step? thanks.

Comment 13 Seth Jennings 2017-10-23 13:57:58 UTC

I think Derek would be able to answer more quickly/accurately.

Please see comment 11.

Comment 14 DeShuai Ma 2017-12-15 02:12:32 UTC

Any comment for this ? thanks

Comment 15 Eric Rich 2018-03-12 13:54:36 UTC

This bug has been identified as a dated (created more than 3 months ago) bug. 
This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, 
however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. 

As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, 
as it is currently not part of the products immediate priorities.

Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details.

Comment 16 Johnray Fuller 2018-03-13 07:16:48 UTC

This seems resolved by kubernetes PR 46542

https://github.com/kubernetes/kubernetes/pull/46542

This was merged into upstream 1.8.

Comment 20 DeShuai Ma 2018-05-03 07:17:43 UTC

Any update for this bug?

Comment 23 DeShuai Ma 2018-05-10 02:31:05 UTC

Verify on openshift v3.7.46.
[root@host-172-16-120-185 ~]# oc version
oc v3.7.46
kubernetes v1.7.6+a08f5eeb62
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://172.16.120.185:8443
openshift v3.7.46
kubernetes v1.7.6+a08f5eeb62

1. Create a pod and check the quota status
[root@host-172-16-120-185 ~]# oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/pod/hello-pod.yaml -n dma
pod "hello-pod" created

[root@host-172-16-120-185 ~]# oc get po -n dma
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          14s
[root@host-172-16-120-185 ~]# 
[root@host-172-16-120-185 ~]# oc describe quota quota-besteffort -n dma
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2


2. Stop the node service where the pod located
[root@host-172-16-120-185 ~]# systemctl stop atomic-openshift-node.service 

3. Watch the pod and quota status after node service stopped
[root@host-172-16-120-185 ~]# while true; do sleep 3; oc get po -n dma ; done
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          2m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          2m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          2m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          2m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          2m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          3m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          4m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Running   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Unknown   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Unknown   0          5m
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Unknown   0          5m


[root@host-172-16-120-185 ~]# while true; do sleep 3; oc describe quota quota-besteffort -n dma ; done
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		1	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		0	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		0	2
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		0	2


[root@host-172-16-120-185 ~]# oc get po -n dma
NAME        READY     STATUS    RESTARTS   AGE
hello-pod   1/1       Unknown   0          9m
[root@host-172-16-120-185 ~]# oc get quota -n dma
NAME               AGE
quota-besteffort   16h
[root@host-172-16-120-185 ~]# 
[root@host-172-16-120-185 ~]# oc describe quota quota-besteffort -n dma
Name:		quota-besteffort
Namespace:	dma
Scopes:		BestEffort
 * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service.
Resource	Used	Hard
--------	----	----
pods		0	2

Comment 26 errata-xmlrpc 2018-05-18 03:54:45 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1576