Bug 1997657
Summary: | Kubelet rejects pods that use resources that should be freed by completed pods | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Ryan Phillips <rphillips> |
Component: | Node | Assignee: | Harshal Patil <harpatil> |
Node sub component: | Kubelet | QA Contact: | Sunil Choudhary <schoudha> |
Status: | CLOSED ERRATA | Docs Contact: | |
Severity: | urgent | ||
Priority: | unspecified | CC: | aos-bugs, fromani, jmencak, nagrawal, yliu1 |
Version: | 4.9 | ||
Target Milestone: | --- | ||
Target Release: | 4.9.0 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-10-18 17:49:05 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Ryan Phillips
2021-08-25 15:40:52 UTC
*** Bug 1998193 has been marked as a duplicate of this bug. *** Verified on 4.9.0-0.nightly-2021-09-10-170926 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-10-170926 True False 7h21m Cluster version is 4.9.0-0.nightly-2021-09-10-170926 $ cat pod.yaml apiVersion: batch/v1 kind: Job metadata: name: test spec: parallelism: 30 completions: 100 template: spec: containers: - name: busybox image: busybox command: ["echo", "ok"] restartPolicy: Never $ oc create -f pod.yaml job.batch/test created $ oc get jobs NAME COMPLETIONS DURATION AGE test 30/100 17s 18s $ oc get jobs NAME COMPLETIONS DURATION AGE test 100/100 34s 2m2s $ oc get pods NAME READY STATUS RESTARTS AGE test--1-27gmm 0/1 Completed 0 113s test--1-2ppj8 0/1 Completed 0 99s test--1-2w25n 0/1 Completed 0 113s test--1-2zv5v 0/1 Completed 0 114s test--1-4d444 0/1 Completed 0 113s test--1-5zrf2 0/1 Completed 0 106s test--1-62w5k 0/1 Completed 0 98s test--1-67xzk 0/1 Completed 0 2m6s test--1-6vfzg 0/1 Completed 0 2m6s test--1-6vjq4 0/1 Completed 0 107s test--1-6xxgk 0/1 Completed 0 2m5s test--1-7b99h 0/1 Completed 0 112s test--1-7d89d 0/1 Completed 0 2m6s test--1-7mfch 0/1 Completed 0 113s test--1-7vzns 0/1 Completed 0 113s test--1-8ml7r 0/1 Completed 0 2m5s test--1-8xwkw 0/1 Completed 0 104s test--1-9xxkh 0/1 Completed 0 113s test--1-b8qpm 0/1 Completed 0 100s test--1-bg2qg 0/1 Completed 0 2m5s test--1-cn79j 0/1 Completed 0 112s test--1-ctkwx 0/1 Completed 0 2m5s test--1-drwkh 0/1 Completed 0 113s test--1-f8lm7 0/1 Completed 0 114s test--1-fg2cq 0/1 Completed 0 113s test--1-fhrlr 0/1 Completed 0 2m6s test--1-fswv5 0/1 Completed 0 112s test--1-fw4bx 0/1 Completed 0 100s test--1-gdsjv 0/1 Completed 0 2m5s test--1-gjc2x 0/1 Completed 0 2m6s test--1-h2kwr 0/1 Completed 0 104s test--1-h67gh 0/1 Completed 0 113s test--1-hcb2h 0/1 Completed 0 113s test--1-hfm97 0/1 Completed 0 106s test--1-hjmxt 0/1 Completed 0 98s test--1-hrv5f 0/1 Completed 0 102s test--1-j478c 0/1 Completed 0 106s test--1-jpd6v 0/1 Completed 0 2m5s test--1-jrnvf 0/1 Completed 0 113s test--1-k6pd6 0/1 Completed 0 113s test--1-k8fmf 0/1 Completed 0 2m6s test--1-k8tvn 0/1 Completed 0 113s test--1-khz9r 0/1 Completed 0 2m6s test--1-ksn9j 0/1 Completed 0 105s test--1-l9k77 0/1 Completed 0 107s test--1-lfvzv 0/1 Completed 0 105s test--1-lkn6n 0/1 Completed 0 103s test--1-mb2ng 0/1 Completed 0 2m5s test--1-mksrl 0/1 Completed 0 2m6s test--1-mn9gn 0/1 Completed 0 104s test--1-mpq5x 0/1 Completed 0 2m5s test--1-mqct8 0/1 Completed 0 112s test--1-n2fz5 0/1 Completed 0 2m6s test--1-n4h4r 0/1 Completed 0 113s test--1-ndsck 0/1 Completed 0 103s test--1-nnj5r 0/1 Completed 0 104s test--1-nx7vw 0/1 Completed 0 113s test--1-p2v2h 0/1 Completed 0 2m6s test--1-p7kr7 0/1 Completed 0 113s test--1-pclmw 0/1 Completed 0 103s test--1-phv4x 0/1 Completed 0 2m6s test--1-qc269 0/1 Completed 0 113s test--1-qzswc 0/1 Completed 0 2m6s test--1-r4sh4 0/1 Completed 0 107s test--1-r6hpk 0/1 Completed 0 2m5s test--1-rxht2 0/1 Completed 0 102s test--1-s486h 0/1 Completed 0 105s test--1-sh9gh 0/1 Completed 0 100s test--1-sk2hm 0/1 Completed 0 2m5s test--1-skk84 0/1 Completed 0 2m5s test--1-sknzc 0/1 Completed 0 2m5s test--1-slfzt 0/1 Completed 0 113s test--1-slpm8 0/1 Completed 0 98s test--1-svnpd 0/1 Completed 0 102s test--1-t7k62 0/1 Completed 0 106s test--1-t7psm 0/1 Completed 0 2m5s test--1-t7wkz 0/1 Completed 0 99s test--1-tdzf8 0/1 Completed 0 2m6s test--1-tk69g 0/1 Completed 0 113s test--1-tmzjp 0/1 Completed 0 100s test--1-tnlpb 0/1 Completed 0 104s test--1-tsxh8 0/1 Completed 0 103s test--1-twj4q 0/1 Completed 0 103s test--1-twvj2 0/1 Completed 0 103s test--1-v7xwx 0/1 Completed 0 2m6s test--1-vbw9r 0/1 Completed 0 105s test--1-vcp5l 0/1 Completed 0 103s test--1-vfsmk 0/1 Completed 0 113s test--1-vk6px 0/1 Completed 0 104s test--1-vnr58 0/1 Completed 0 2m6s test--1-vpjrb 0/1 Completed 0 103s test--1-wcggt 0/1 Completed 0 104s test--1-wlxdj 0/1 Completed 0 2m6s test--1-wmktk 0/1 Completed 0 113s test--1-wprk7 0/1 Completed 0 112s test--1-xv8vx 0/1 Completed 0 99s test--1-zcfdx 0/1 Completed 0 2m6s test--1-zckbf 0/1 Completed 0 104s test--1-zl7fv 0/1 Completed 0 113s test--1-zp62p 0/1 Completed 0 113s I'm re-opening this BZ as it currently fails again on 4.9.0-0.nightly-2021-10-01-123414 $ oc version Client Version: 4.9.0-rc.3 Server Version: 4.9.0-0.nightly-2021-10-01-123414 Kubernetes Version: v1.22.0-rc.0+8719299 Testing on an 8 vCPU box with a slightly modified (limits) pods.yaml: $ cat pods.yaml apiVersion: batch/v1 kind: Job metadata: name: test spec: parallelism: 30 completions: 100 template: spec: containers: - name: busybox image: busybox command: ["echo", "ok"] resources: limits: cpu: "1" memory: "40Mi" restartPolicy: Never $ oc create -f pods.yaml $ oc get jobs NAME COMPLETIONS DURATION AGE test 6/100 7m58s 7m58s $ oc get po NAME READY STATUS RESTARTS AGE test--1-66qlg 0/1 OutOfcpu 0 7m27s test--1-6htll 0/1 Completed 0 7m32s test--1-8g8mh 0/1 Completed 0 7m32s test--1-9bbvz 0/1 OutOfcpu 0 7m27s test--1-c9ldg 0/1 Completed 0 7m32s test--1-cspws 0/1 Completed 0 7m32s test--1-d5thk 0/1 Completed 0 7m32s test--1-htdjz 0/1 OutOfcpu 0 7m27s test--1-nskmx 0/1 OutOfcpu 0 7m27s test--1-pq7cs 0/1 OutOfcpu 0 7m27s test--1-q55wv 0/1 OutOfcpu 0 7m27s test--1-wnm4n 0/1 OutOfcpu 0 7m27s test--1-xn2cg 0/1 Completed 0 7m32s Note that this issue was fixed for me by https://github.com/openshift/kubernetes/pull/920 but broken again by https://github.com/openshift/kubernetes/pull/949 In other words, the last commit that still fixes the issue for me in openshift/kubernetes is 75ee3073266f07baaba5db004cde0636425737cf sh-4.4# ./kubelet --version Kubernetes v1.22.1-1672+75ee3073266f07-dirty A custom-compiled kubelet^ passes the above tests without issues and all 100 pods are successfully "Completed". *** Bug 1998193 has been marked as a duplicate of this bug. *** Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 |