2011513 – Kubelet rejects pods that use resources that should be freed by completed pods

Bug 2011513 - Kubelet rejects pods that use resources that should be freed by completed pods

Summary: Kubelet rejects pods that use resources that should be freed by completed pods

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Node
Sub Component:
Version:	4.9
Hardware:	Unspecified
OS:	Unspecified
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	4.10.0
Assignee:	Ryan Phillips
QA Contact:	Weinan Liu
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	2009092 (view as bug list)
Depends On:
Blocks:	2011815 2011956
TreeView+	depends on / blocked

Reported:	2021-10-06 17:41 UTC by Jiří Mencák
Modified:	2022-03-10 16:17 UTC (History)
CC List:	8 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Clones:	2011815 (view as bug list)
Environment:
Last Closed:	2022-03-10 16:17:19 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Reproducer (1017 bytes, application/x-shellscript) 2021-10-06 17:41 UTC, Jiří Mencák	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Github	kubernetes kubernetes issues 105523	None	open	Kubelet is reporting OutOfCpu on previously running workloads after restart	2021-10-06 18:52:09 UTC
Github	openshift kubernetes pull 1007	None	open	Bug 2011513: kubelet: do not arbitrarily create a podSyncStatus for finished pods	2021-10-07 13:09:56 UTC
Red Hat Product Errata	RHSA-2022:0056	None	None	None	2022-03-10 16:17:44 UTC

Description Jiří Mencák 2021-10-06 17:41:04 UTC

Created attachment 1829973 [details]
Reproducer

Description of problem:
Kubelet rejects pods that use resources that should be freed by completed pods.

Version-Release number of selected component (if applicable):
4.9.0-0.ci-2021-10-06-085105

How reproducible:
Always

Steps to Reproduce:
1. Create an SNO cluster or a cluster with 1 single worker node. Use 8 vCPU node worker (or adjust the attached reproducer complete.sh script).
2. run the ./complete.sh reproducer script
3. reboot the worker node

Actual results:
The pods that were running prior the reboot will end up in OutOfcpu state.

$ oc get po
NAME        READY   STATUS      RESTARTS   AGE
complete1   0/1     Completed   0          100m
complete2   0/1     Completed   0          100m
complete3   0/1     Completed   0          100m
complete4   0/1     Completed   0          100m
complete5   0/1     Completed   0          100m
complete6   0/1     Completed   0          100m
complete7   0/1     OutOfcpu    0          100m
complete8   0/1     OutOfcpu    0          100m
running1    0/1     OutOfcpu    0          100m
running2    0/1     OutOfcpu    0          100m
running3    0/1     OutOfcpu    0          100m
running4    0/1     OutOfcpu    0          100m
running5    0/1     OutOfcpu    0          100m


Expected results:
The pods that were running prior the reboot will be Running again.

Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1997657#c34

Comment 3 Elana Hashman 2021-10-06 18:51:49 UTC

I have been able to reproduce this on an upstream single node local-up-cluster.sh - filed https://github.com/kubernetes/kubernetes/issues/105523

Comment 8 Elana Hashman 2021-10-07 23:52:36 UTC

Upstream fix PR: https://github.com/kubernetes/kubernetes/pull/105527

E2E test that verifies behaviour is broken on HEAD: https://github.com/kubernetes/kubernetes/pull/105552

Cherry-pick to verify behaviour is working on 1.21: https://github.com/kubernetes/kubernetes/pull/105553

Test name is "[sig-node] Restart [Serial] [Slow] [Disruptive] Kubelet should correctly account for terminated pods after restart" and is part of the node serial suite.


We are waiting to get final CI results back before LGTM/approval. Only the first PR should merge, and then backported to 1.22.

I created a "/test pull-kubernetes-node-kubelet-serial-122" job for testing this against the 1.22 branch.

Comment 11 Weinan Liu 2021-10-08 08:11:57 UTC

Issue not fixed on 4.10.0-0.nightly-2021-10-08-050801, which is still in Ready state(not Accepted) at 2021-10-08T05:08:01Z.
It may not have the PR included. Waiting for next build to check.
oc get po
NAME        READY   STATUS      RESTARTS   AGE
complete1   0/1     Completed   0          3m56s
complete2   0/1     Completed   0          3m51s
complete3   0/1     Completed   0          3m46s
complete4   0/1     Completed   0          3m40s
complete5   0/1     Completed   0          3m37s
complete6   0/1     Completed   0          3m32s
complete7   0/1     Completed   0          3m26s
complete8   0/1     Completed   0          3m21s
running1    1/1     Running     1          3m16s
running2    0/1     OutOfcpu    0          3m15s
running3    0/1     OutOfcpu    0          3m14s
running4    0/1     OutOfcpu    0          3m13s
running5    0/1     OutOfcpu    0          3m12s
running6    0/1     OutOfcpu    0          3m11s

Comment 12 Jiří Mencák 2021-10-08 08:36:47 UTC

Hey Weinan, the fix is not in 4.10.0-0.nightly-2021-10-08-050801 yet, thanks for checking!  One way to check is git log for openshift/kubernetes and compare it with kubelet version.  You need commit equal or higher than 931224322c58da67eb8b3e9d4d3ff0e7dbf81cf2.

You can get the kubelet version by checking the output of
$ oc get no
NAME                                                    STATUS   ROLES    AGE     VERSION
jmencak-fxfd2-master-0.c.openshift-gce-devel.internal   Ready    master   5m27s   v1.22.1+4d7e196
jmencak-fxfd2-master-1.c.openshift-gce-devel.internal   Ready    master   5m39s   v1.22.1+4d7e196
jmencak-fxfd2-master-2.c.openshift-gce-devel.internal   Ready    master   5m40s   v1.22.1+4d7e196

4d7e196 indicates the kubelet version that doesn't have the fix.

Comment 13 Weinan Liu 2021-10-08 10:24:39 UTC

NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-128-201.us-east-2.compute.internal   Ready      worker   48m   v1.22.1+4d7e196
ip-10-0-142-115.us-east-2.compute.internal   Ready      master   57m   v1.22.1+4d7e196
ip-10-0-165-183.us-east-2.compute.internal   Ready      master   58m   v1.22.1+4d7e196
ip-10-0-206-28.us-east-2.compute.internal    Ready      master   57m   v1.22.1+4d7e196
 oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-08-090421   True        False         35m     Cluster version is 4.10.0-0.nightly-2021-10-08-090421

@Jiří, thanks!
4.10.0-0.nightly-2021-10-08-090421 does not have the fix yet.

Comment 15 Ryan Phillips 2021-10-12 19:27:45 UTC

*** Bug 2005647 has been marked as a duplicate of this bug. ***

Comment 16 Ryan Phillips 2021-10-13 14:32:03 UTC

*** Bug 2009092 has been marked as a duplicate of this bug. ***

Comment 17 Weinan Liu 2021-10-19 02:30:48 UTC

verified to be fixed
$ ./complete.sh
pod/complete1 created
"Pending"
"Pending"
"Succeeded"
pod/complete2 created
"Pending"
"Pending"
"Succeeded"
pod/complete3 created
"Pending"
"Pending"
"Succeeded"
pod/complete4 created
"Pending"
"Pending"
"Pending"
"Succeeded"
pod/complete5 created
"Pending"
"Pending"
"Succeeded"
pod/complete6 created
"Pending"
"Pending"
"Succeeded"
pod/complete7 created
"Pending"
"Pending"
"Succeeded"
pod/complete8 created
"Pending"
"Pending"
"Succeeded"
pod/running1 created
pod/running2 created
pod/running3 created
pod/running4 created
pod/running5 created
pod/running6 created
$ oc get po
NAME        READY   STATUS      RESTARTS   AGE
complete1   0/1     Completed   0          6m29s
complete2   0/1     Completed   0          6m23s
complete3   0/1     Completed   0          6m17s
complete4   0/1     Completed   0          6m11s
complete5   0/1     Completed   0          6m4s
complete6   0/1     Completed   0          5m58s
complete7   0/1     Completed   0          5m52s
complete8   0/1     Completed   0          5m46s
running1    1/1     Running     0          5m40s
running2    1/1     Running     0          5m39s
running3    0/1     Pending     0          5m38s
running4    0/1     Pending     0          5m37s
running5    0/1     Pending     0          5m36s
running6    0/1     Pending     0          5m35s
$ oc get no
NAME                                 STATUS   ROLES           AGE   VERSION
ci-ln-xxnd56k-f76d1-dx979-master-0   Ready    master,worker   26m   v1.22.1+9312243
[weinliu@rhel8 verification-tests]$ oc debug node/ci-ln-xxnd56k-f76d1-dx979-master-0
Starting pod/ci-ln-xxnd56k-f76d1-dx979-master-0-debug ...
To use host binaries, run `chroot /host`

chroot /host
Pod IP: 10.0.0.3
If you don't see a command prompt, try pressing enter.

sh-4.4# chroot /host
sh-4.4# reboot

Removing debug pod ...
error: unable to delete the debug pod "ci-ln-xxnd56k-f76d1-dx979-master-0-debug": Delete https://api.ci-ln-xxnd56k-f76d1.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/default/pods/ci-ln-xxnd56k-f76d1-dx979-master-0-debug: unexpected EOF

[weinliu@rhel8 verification-tests]$ oc get no
NAME                                 STATUS   ROLES           AGE   VERSION
ci-ln-xxnd56k-f76d1-dx979-master-0   Ready    master,worker   29m   v1.22.1+9312243
$ oc get po
NAME                                       READY   STATUS              RESTARTS   AGE
ci-ln-xxnd56k-f76d1-dx979-master-0-debug   0/1     Completed           1          4m17s
complete1                                  0/1     Completed           0          11m
complete2                                  0/1     Completed           0          11m
complete3                                  0/1     Completed           0          10m
complete4                                  0/1     Completed           0          10m
complete5                                  0/1     Completed           0          10m
complete6                                  0/1     Completed           0          10m
complete7                                  0/1     Completed           0          10m
complete8                                  0/1     Completed           0          10m
running1                                   0/1     ContainerCreating   1          10m
running2                                   0/1     ContainerCreating   1          10m
running3                                   0/1     Pending             0          10m
running4                                   0/1     Pending             0          10m
running5                                   0/1     Pending             0          10m
running6                                   0/1     Pending             0          10m

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-16-173656   True        False         14m     Cluster version is 4.10.0-0.nightly-2021-10-16-173656

Comment 20 errata-xmlrpc 2022-03-10 16:17:19 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056

Note You need to log in before you can comment on or make changes to this bug.