Description of problem: 3 Pods are serviced. When I power-off 1 node, after 5 minutes, the new pod was created on the other node. However the orgin pod was not removed and remained as 'Unknown' status. This issue is occurred only in OCP 3.5.x not OCP 3.4. Found belows: It's working as expected, starting from k8s 1.5 / OpenShift 3.5, it won't remove unreachable pods automatically for pod safety. https://github.com/kubernetes/kubernetes/issues/44458 Not deleting the unreachable pod was a decision made in 1.5 in the interest of providing safety guarantees. The relevant rationale doc is https://github.com/kubernetes/community/blob/master/contributors/design-proposals/pod-safety.md But this expected behavior in OCP 3.5 makes another issue. When Quota and limintrange are set, the termination pod(Unknown pod) does not return resource. Version-Release number of selected component (if applicable): # oc version oc v3.5.5.5 kubernetes v1.5.2+43a9be4 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://pasmaster.lotte.cloud:8443 openshift v3.5.5.5 kubernetes v1.5.2+43a9be4 ----------------------------------------------------------------------------- # openshift version openshift v3.5.5.5 kubernetes v1.5.2+43a9be4 etcd 3.1.0 ----------------------------------------------------------------------------- # docker version Client: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64 Server: Version: 1.12.6 API version: 1.24 Package version: docker-common-1.12.6-16.el7.x86_64 Go version: go1.7.4 Git commit: 3a094bd/1.12.6 Built: Tue Mar 21 13:30:59 2017 OS/Arch: linux/amd64 ----------------------------------------------------------------------------- # kubectl version Client Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2+43a9be4", GitCommit:"43a9be4", GitTreeState:"clean", BuildDate:"2017-04-08T04:31:22Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"} Server Version: version.Info{Major:"1", Minor:"5", GitVersion:"v1.5.2+43a9be4", GitCommit:"43a9be4", GitTreeState:"clean", BuildDate:"2017-04-08T04:31:22Z", GoVersion:"go1.7.4", Compiler:"gc", Platform:"linux/amd64"} How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
At this time, quota only ignores pods that have reached a terminal state. I am apprehensive to exclude pods that have not reached a terminal state.
I am going to propose a PR in upstream kubernetes to account for this behavior. If a pod is in a non-terminal state, and has a deletion stamp, and the pod.status.reason is NodeLost, quota could ignore it.
Opened upstream PR: https://github.com/kubernetes/kubernetes/pull/46542 I am inclined to do something more generically that handles any pod stuck terminating scenario. For example, quota could ignore any pod whose marked for deletion and the current observed time > grace period. In this model, the quota system would release the quota after that interval + [quota sync interval].
The quota system is working as designed today by counting all pods not in a terminal state (i.e. its phase is not succeeded or failed). I have opened a PR to try to augment the quota system to handle scenarios where a pod is stuck terminating for extenuating situations as described here. I am not marking this as a 3.6 release blocker, but will try to get the feature enhanced in the Kubernetes 1.7+ release cycles.
I am moving as an RFE. "As a user, if my pod is terminating, and has exceeded its associated grace period, I would like my quota to be released for use by other pods in the system".
I will continue to push https://github.com/kubernetes/kubernetes/pull/46542 and hope to get feature enhanced in k8s 1.8 time-frame.
Origin PR: https://github.com/openshift/origin/pull/16425
New Origin PR: https://github.com/openshift/origin/pull/16722
Should fix in v3.7.0-0.149.0
Try it on openshift v3.7.0-0.158.0, After stop the node wait the pod become Unknown, the quota is not released. [root@qe-pod37-master-etcd-1 ~]# oc get po hello-pod -n dma1 -o yaml apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: anyuid creationTimestamp: 2017-10-23T09:38:39Z deletionGracePeriodSeconds: 30 deletionTimestamp: 2017-10-23T09:46:46Z labels: name: hello-pod name: hello-pod namespace: dma1 resourceVersion: "30079" selfLink: /api/v1/namespaces/dma1/pods/hello-pod uid: f2f6cb4b-b7d5-11e7-a2c6-fa163e03968e spec: containers: - image: docker.io/deshuai/hello-pod:latest imagePullPolicy: IfNotPresent name: hello-pod ports: - containerPort: 8080 protocol: TCP resources: {} securityContext: capabilities: drop: - MKNOD privileged: false seLinuxOptions: level: s0:c12,c9 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-zfj3f readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-x1758 nodeName: host-8-241-39.host.centralci.eng.rdu2.redhat.com restartPolicy: Always schedulerName: default-scheduler securityContext: seLinuxOptions: level: s0:c12,c9 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - emptyDir: {} name: tmp - name: default-token-zfj3f secret: defaultMode: 420 secretName: default-token-zfj3f status: conditions: - lastProbeTime: null lastTransitionTime: 2017-10-23T09:38:38Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2017-10-23T09:38:40Z status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2017-10-23T09:38:39Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://ce9865b053421d4de48ee803785dc56c03b00c66bb52703c0107ed30a98c742b image: docker.io/deshuai/hello-pod:latest imageID: docker-pullable://docker.io/deshuai/hello-pod@sha256:289953c559120c7d2ca92d92810885887ee45c871c373a1e492e845eca575b8c lastState: {} name: hello-pod ready: true restartCount: 0 state: running: startedAt: 2017-10-23T09:38:39Z hostIP: 172.16.120.55 message: Node host-8-241-39.host.centralci.eng.rdu2.redhat.com which was running pod hello-pod is unresponsive phase: Running podIP: 10.129.0.118 qosClass: BestEffort reason: NodeLost startTime: 2017-10-23T09:38:38Z [root@qe-pod37-master-etcd-1 ~]# oc get po -n dma1 NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 10m [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# oc describe quota myquota -n dma1 Name: myquota Namespace: dma1 Resource Used Hard -------- ---- ---- pods 1 10 resourcequotas 1 1
Anything is wrong in my test step? thanks.
I think Derek would be able to answer more quickly/accurately. Please see comment 11.
Any comment for this ? thanks
This bug has been identified as a dated (created more than 3 months ago) bug. This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, as it is currently not part of the products immediate priorities. Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details.
This seems resolved by kubernetes PR 46542 https://github.com/kubernetes/kubernetes/pull/46542 This was merged into upstream 1.8.
Any update for this bug?
Verify on openshift v3.7.46. [root@host-172-16-120-185 ~]# oc version oc v3.7.46 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://172.16.120.185:8443 openshift v3.7.46 kubernetes v1.7.6+a08f5eeb62 1. Create a pod and check the quota status [root@host-172-16-120-185 ~]# oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/pod/hello-pod.yaml -n dma pod "hello-pod" created [root@host-172-16-120-185 ~]# oc get po -n dma NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 14s [root@host-172-16-120-185 ~]# [root@host-172-16-120-185 ~]# oc describe quota quota-besteffort -n dma Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 2. Stop the node service where the pod located [root@host-172-16-120-185 ~]# systemctl stop atomic-openshift-node.service 3. Watch the pod and quota status after node service stopped [root@host-172-16-120-185 ~]# while true; do sleep 3; oc get po -n dma ; done NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 5m [root@host-172-16-120-185 ~]# while true; do sleep 3; oc describe quota quota-besteffort -n dma ; done Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2 [root@host-172-16-120-185 ~]# oc get po -n dma NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 9m [root@host-172-16-120-185 ~]# oc get quota -n dma NAME AGE quota-besteffort 16h [root@host-172-16-120-185 ~]# [root@host-172-16-120-185 ~]# oc describe quota quota-besteffort -n dma Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1576