Description of problem: https://github.com/kubernetes/kubernetes/issues/104560 There appears to be a change of behavior / possible regression in k8s 1.22 with respect to pod lifecycle behavior. We've started to see some cases when multiple pods are being created (often as part of a k8s job) the following will happen: Multiple pods get created by e.g. job controller (e.g. say 5). Each pod in job requests majority of node memory so only one such pod can fit on node at a time. Scheduler schedules [first] pod to node, rest (4 pods) stay in pending Kubelet accepts first pod, runs it to completion Next pod get scheduled to the node Kubelet rejects the pod during pod admission, e.g. with OutOfMemory, despite the fact that the first pod has already completed successfully This results in many of pods backing the job getting into Failed phase and having to be recreated by job controller. The job can often get into BackofLimit exceeded to hitting backoffLimit. The root issue seems to be a mismatch of when kubelet actually considers the pod to complete and what it reports to API server (which causes the followup pods to be scheduled to it). I'm still investigating further, but this may be related to the pod lifecycle changes in 1.22 -- #102344 Version-Release number of selected component (if applicable): How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
*** Bug 1998193 has been marked as a duplicate of this bug. ***
Verified on 4.9.0-0.nightly-2021-09-10-170926 $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-10-170926 True False 7h21m Cluster version is 4.9.0-0.nightly-2021-09-10-170926 $ cat pod.yaml apiVersion: batch/v1 kind: Job metadata: name: test spec: parallelism: 30 completions: 100 template: spec: containers: - name: busybox image: busybox command: ["echo", "ok"] restartPolicy: Never $ oc create -f pod.yaml job.batch/test created $ oc get jobs NAME COMPLETIONS DURATION AGE test 30/100 17s 18s $ oc get jobs NAME COMPLETIONS DURATION AGE test 100/100 34s 2m2s $ oc get pods NAME READY STATUS RESTARTS AGE test--1-27gmm 0/1 Completed 0 113s test--1-2ppj8 0/1 Completed 0 99s test--1-2w25n 0/1 Completed 0 113s test--1-2zv5v 0/1 Completed 0 114s test--1-4d444 0/1 Completed 0 113s test--1-5zrf2 0/1 Completed 0 106s test--1-62w5k 0/1 Completed 0 98s test--1-67xzk 0/1 Completed 0 2m6s test--1-6vfzg 0/1 Completed 0 2m6s test--1-6vjq4 0/1 Completed 0 107s test--1-6xxgk 0/1 Completed 0 2m5s test--1-7b99h 0/1 Completed 0 112s test--1-7d89d 0/1 Completed 0 2m6s test--1-7mfch 0/1 Completed 0 113s test--1-7vzns 0/1 Completed 0 113s test--1-8ml7r 0/1 Completed 0 2m5s test--1-8xwkw 0/1 Completed 0 104s test--1-9xxkh 0/1 Completed 0 113s test--1-b8qpm 0/1 Completed 0 100s test--1-bg2qg 0/1 Completed 0 2m5s test--1-cn79j 0/1 Completed 0 112s test--1-ctkwx 0/1 Completed 0 2m5s test--1-drwkh 0/1 Completed 0 113s test--1-f8lm7 0/1 Completed 0 114s test--1-fg2cq 0/1 Completed 0 113s test--1-fhrlr 0/1 Completed 0 2m6s test--1-fswv5 0/1 Completed 0 112s test--1-fw4bx 0/1 Completed 0 100s test--1-gdsjv 0/1 Completed 0 2m5s test--1-gjc2x 0/1 Completed 0 2m6s test--1-h2kwr 0/1 Completed 0 104s test--1-h67gh 0/1 Completed 0 113s test--1-hcb2h 0/1 Completed 0 113s test--1-hfm97 0/1 Completed 0 106s test--1-hjmxt 0/1 Completed 0 98s test--1-hrv5f 0/1 Completed 0 102s test--1-j478c 0/1 Completed 0 106s test--1-jpd6v 0/1 Completed 0 2m5s test--1-jrnvf 0/1 Completed 0 113s test--1-k6pd6 0/1 Completed 0 113s test--1-k8fmf 0/1 Completed 0 2m6s test--1-k8tvn 0/1 Completed 0 113s test--1-khz9r 0/1 Completed 0 2m6s test--1-ksn9j 0/1 Completed 0 105s test--1-l9k77 0/1 Completed 0 107s test--1-lfvzv 0/1 Completed 0 105s test--1-lkn6n 0/1 Completed 0 103s test--1-mb2ng 0/1 Completed 0 2m5s test--1-mksrl 0/1 Completed 0 2m6s test--1-mn9gn 0/1 Completed 0 104s test--1-mpq5x 0/1 Completed 0 2m5s test--1-mqct8 0/1 Completed 0 112s test--1-n2fz5 0/1 Completed 0 2m6s test--1-n4h4r 0/1 Completed 0 113s test--1-ndsck 0/1 Completed 0 103s test--1-nnj5r 0/1 Completed 0 104s test--1-nx7vw 0/1 Completed 0 113s test--1-p2v2h 0/1 Completed 0 2m6s test--1-p7kr7 0/1 Completed 0 113s test--1-pclmw 0/1 Completed 0 103s test--1-phv4x 0/1 Completed 0 2m6s test--1-qc269 0/1 Completed 0 113s test--1-qzswc 0/1 Completed 0 2m6s test--1-r4sh4 0/1 Completed 0 107s test--1-r6hpk 0/1 Completed 0 2m5s test--1-rxht2 0/1 Completed 0 102s test--1-s486h 0/1 Completed 0 105s test--1-sh9gh 0/1 Completed 0 100s test--1-sk2hm 0/1 Completed 0 2m5s test--1-skk84 0/1 Completed 0 2m5s test--1-sknzc 0/1 Completed 0 2m5s test--1-slfzt 0/1 Completed 0 113s test--1-slpm8 0/1 Completed 0 98s test--1-svnpd 0/1 Completed 0 102s test--1-t7k62 0/1 Completed 0 106s test--1-t7psm 0/1 Completed 0 2m5s test--1-t7wkz 0/1 Completed 0 99s test--1-tdzf8 0/1 Completed 0 2m6s test--1-tk69g 0/1 Completed 0 113s test--1-tmzjp 0/1 Completed 0 100s test--1-tnlpb 0/1 Completed 0 104s test--1-tsxh8 0/1 Completed 0 103s test--1-twj4q 0/1 Completed 0 103s test--1-twvj2 0/1 Completed 0 103s test--1-v7xwx 0/1 Completed 0 2m6s test--1-vbw9r 0/1 Completed 0 105s test--1-vcp5l 0/1 Completed 0 103s test--1-vfsmk 0/1 Completed 0 113s test--1-vk6px 0/1 Completed 0 104s test--1-vnr58 0/1 Completed 0 2m6s test--1-vpjrb 0/1 Completed 0 103s test--1-wcggt 0/1 Completed 0 104s test--1-wlxdj 0/1 Completed 0 2m6s test--1-wmktk 0/1 Completed 0 113s test--1-wprk7 0/1 Completed 0 112s test--1-xv8vx 0/1 Completed 0 99s test--1-zcfdx 0/1 Completed 0 2m6s test--1-zckbf 0/1 Completed 0 104s test--1-zl7fv 0/1 Completed 0 113s test--1-zp62p 0/1 Completed 0 113s
I'm re-opening this BZ as it currently fails again on 4.9.0-0.nightly-2021-10-01-123414 $ oc version Client Version: 4.9.0-rc.3 Server Version: 4.9.0-0.nightly-2021-10-01-123414 Kubernetes Version: v1.22.0-rc.0+8719299 Testing on an 8 vCPU box with a slightly modified (limits) pods.yaml: $ cat pods.yaml apiVersion: batch/v1 kind: Job metadata: name: test spec: parallelism: 30 completions: 100 template: spec: containers: - name: busybox image: busybox command: ["echo", "ok"] resources: limits: cpu: "1" memory: "40Mi" restartPolicy: Never $ oc create -f pods.yaml $ oc get jobs NAME COMPLETIONS DURATION AGE test 6/100 7m58s 7m58s $ oc get po NAME READY STATUS RESTARTS AGE test--1-66qlg 0/1 OutOfcpu 0 7m27s test--1-6htll 0/1 Completed 0 7m32s test--1-8g8mh 0/1 Completed 0 7m32s test--1-9bbvz 0/1 OutOfcpu 0 7m27s test--1-c9ldg 0/1 Completed 0 7m32s test--1-cspws 0/1 Completed 0 7m32s test--1-d5thk 0/1 Completed 0 7m32s test--1-htdjz 0/1 OutOfcpu 0 7m27s test--1-nskmx 0/1 OutOfcpu 0 7m27s test--1-pq7cs 0/1 OutOfcpu 0 7m27s test--1-q55wv 0/1 OutOfcpu 0 7m27s test--1-wnm4n 0/1 OutOfcpu 0 7m27s test--1-xn2cg 0/1 Completed 0 7m32s Note that this issue was fixed for me by https://github.com/openshift/kubernetes/pull/920 but broken again by https://github.com/openshift/kubernetes/pull/949 In other words, the last commit that still fixes the issue for me in openshift/kubernetes is 75ee3073266f07baaba5db004cde0636425737cf sh-4.4# ./kubelet --version Kubernetes v1.22.1-1672+75ee3073266f07-dirty A custom-compiled kubelet^ passes the above tests without issues and all 100 pods are successfully "Completed".
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759