Bug 1455743
Summary: | 'Unknown' POD did not return resource in OCP 3.5 | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Min Woo Park <mpark> |
Component: | Node | Assignee: | Seth Jennings <sjenning> |
Status: | CLOSED ERRATA | QA Contact: | DeShuai Ma <dma> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 3.5.0 | CC: | aos-bugs, decarr, dma, eparis, hgomes, jokerman, jrfuller, knakayam, mbarrett, mmccomas, sjenning |
Target Milestone: | --- | Keywords: | Reopened |
Target Release: | 3.7.z | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Resources consumed by pods whose node is unreachable or have past their termination grace period are no longer counted against quota.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-18 03:54:45 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Min Woo Park
2017-05-26 01:51:37 UTC
At this time, quota only ignores pods that have reached a terminal state. I am apprehensive to exclude pods that have not reached a terminal state. I am going to propose a PR in upstream kubernetes to account for this behavior. If a pod is in a non-terminal state, and has a deletion stamp, and the pod.status.reason is NodeLost, quota could ignore it. Opened upstream PR: https://github.com/kubernetes/kubernetes/pull/46542 I am inclined to do something more generically that handles any pod stuck terminating scenario. For example, quota could ignore any pod whose marked for deletion and the current observed time > grace period. In this model, the quota system would release the quota after that interval + [quota sync interval]. The quota system is working as designed today by counting all pods not in a terminal state (i.e. its phase is not succeeded or failed). I have opened a PR to try to augment the quota system to handle scenarios where a pod is stuck terminating for extenuating situations as described here. I am not marking this as a 3.6 release blocker, but will try to get the feature enhanced in the Kubernetes 1.7+ release cycles. I am moving as an RFE. "As a user, if my pod is terminating, and has exceeded its associated grace period, I would like my quota to be released for use by other pods in the system". I will continue to push https://github.com/kubernetes/kubernetes/pull/46542 and hope to get feature enhanced in k8s 1.8 time-frame. New Origin PR: https://github.com/openshift/origin/pull/16722 Should fix in v3.7.0-0.149.0 Try it on openshift v3.7.0-0.158.0, After stop the node wait the pod become Unknown, the quota is not released. [root@qe-pod37-master-etcd-1 ~]# oc get po hello-pod -n dma1 -o yaml apiVersion: v1 kind: Pod metadata: annotations: openshift.io/scc: anyuid creationTimestamp: 2017-10-23T09:38:39Z deletionGracePeriodSeconds: 30 deletionTimestamp: 2017-10-23T09:46:46Z labels: name: hello-pod name: hello-pod namespace: dma1 resourceVersion: "30079" selfLink: /api/v1/namespaces/dma1/pods/hello-pod uid: f2f6cb4b-b7d5-11e7-a2c6-fa163e03968e spec: containers: - image: docker.io/deshuai/hello-pod:latest imagePullPolicy: IfNotPresent name: hello-pod ports: - containerPort: 8080 protocol: TCP resources: {} securityContext: capabilities: drop: - MKNOD privileged: false seLinuxOptions: level: s0:c12,c9 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeMounts: - mountPath: /tmp name: tmp - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: default-token-zfj3f readOnly: true dnsPolicy: ClusterFirst imagePullSecrets: - name: default-dockercfg-x1758 nodeName: host-8-241-39.host.centralci.eng.rdu2.redhat.com restartPolicy: Always schedulerName: default-scheduler securityContext: seLinuxOptions: level: s0:c12,c9 serviceAccount: default serviceAccountName: default terminationGracePeriodSeconds: 30 volumes: - emptyDir: {} name: tmp - name: default-token-zfj3f secret: defaultMode: 420 secretName: default-token-zfj3f status: conditions: - lastProbeTime: null lastTransitionTime: 2017-10-23T09:38:38Z status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: 2017-10-23T09:38:40Z status: "False" type: Ready - lastProbeTime: null lastTransitionTime: 2017-10-23T09:38:39Z status: "True" type: PodScheduled containerStatuses: - containerID: docker://ce9865b053421d4de48ee803785dc56c03b00c66bb52703c0107ed30a98c742b image: docker.io/deshuai/hello-pod:latest imageID: docker-pullable://docker.io/deshuai/hello-pod@sha256:289953c559120c7d2ca92d92810885887ee45c871c373a1e492e845eca575b8c lastState: {} name: hello-pod ready: true restartCount: 0 state: running: startedAt: 2017-10-23T09:38:39Z hostIP: 172.16.120.55 message: Node host-8-241-39.host.centralci.eng.rdu2.redhat.com which was running pod hello-pod is unresponsive phase: Running podIP: 10.129.0.118 qosClass: BestEffort reason: NodeLost startTime: 2017-10-23T09:38:38Z [root@qe-pod37-master-etcd-1 ~]# oc get po -n dma1 NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 10m [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# [root@qe-pod37-master-etcd-1 ~]# oc describe quota myquota -n dma1 Name: myquota Namespace: dma1 Resource Used Hard -------- ---- ---- pods 1 10 resourcequotas 1 1 Anything is wrong in my test step? thanks. I think Derek would be able to answer more quickly/accurately. Please see comment 11. Any comment for this ? thanks This bug has been identified as a dated (created more than 3 months ago) bug. This bug has been triaged (has a trello card linked to it), or reviewed by Engineering/PM and has been put into the product backlog, however this bug has not been slated for a currently planned release (3.9, 3.10 or 3.11), which cover our releases for the rest of the calendar year. As a result of this bugs age, state on the current roadmap and PM Score (being below 70), this bug is being Closed - Differed, as it is currently not part of the products immediate priorities. Please see: https://docs.google.com/document/d/1zdqF4rB3ea8GmVIZ7qWCVYUaQ7-EexUrQEF0MTwdDkw/edit for more details. This seems resolved by kubernetes PR 46542 https://github.com/kubernetes/kubernetes/pull/46542 This was merged into upstream 1.8. Any update for this bug? Verify on openshift v3.7.46. [root@host-172-16-120-185 ~]# oc version oc v3.7.46 kubernetes v1.7.6+a08f5eeb62 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://172.16.120.185:8443 openshift v3.7.46 kubernetes v1.7.6+a08f5eeb62 1. Create a pod and check the quota status [root@host-172-16-120-185 ~]# oc create -f https://raw.githubusercontent.com/mdshuai/testfile-openshift/master/k8s/pod/hello-pod.yaml -n dma pod "hello-pod" created [root@host-172-16-120-185 ~]# oc get po -n dma NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 14s [root@host-172-16-120-185 ~]# [root@host-172-16-120-185 ~]# oc describe quota quota-besteffort -n dma Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 2. Stop the node service where the pod located [root@host-172-16-120-185 ~]# systemctl stop atomic-openshift-node.service 3. Watch the pod and quota status after node service stopped [root@host-172-16-120-185 ~]# while true; do sleep 3; oc get po -n dma ; done NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 2m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 3m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 4m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Running 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 5m NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 5m [root@host-172-16-120-185 ~]# while true; do sleep 3; oc describe quota quota-besteffort -n dma ; done Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 1 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2 Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2 [root@host-172-16-120-185 ~]# oc get po -n dma NAME READY STATUS RESTARTS AGE hello-pod 1/1 Unknown 0 9m [root@host-172-16-120-185 ~]# oc get quota -n dma NAME AGE quota-besteffort 16h [root@host-172-16-120-185 ~]# [root@host-172-16-120-185 ~]# oc describe quota quota-besteffort -n dma Name: quota-besteffort Namespace: dma Scopes: BestEffort * Matches all pods that do not have resource requirements set. These pods have a best effort quality of service. Resource Used Hard -------- ---- ---- pods 0 2 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1576 |