Bug 2011513 - Kubelet rejects pods that use resources that should be freed by completed pods
Summary: Kubelet rejects pods that use resources that should be freed by completed pods
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
urgent
urgent
Target Milestone: ---
: 4.10.0
Assignee: Ryan Phillips
QA Contact: Weinan Liu
URL:
Whiteboard:
: 2009092 (view as bug list)
Depends On:
Blocks: 2011815 2011956
TreeView+ depends on / blocked
 
Reported: 2021-10-06 17:41 UTC by Jiří Mencák
Modified: 2022-03-10 16:17 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
: 2011815 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:17:19 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Reproducer (1017 bytes, application/x-shellscript)
2021-10-06 17:41 UTC, Jiří Mencák
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github kubernetes kubernetes issues 105523 0 None open Kubelet is reporting OutOfCpu on previously running workloads after restart 2021-10-06 18:52:09 UTC
Github openshift kubernetes pull 1007 0 None open Bug 2011513: kubelet: do not arbitrarily create a podSyncStatus for finished pods 2021-10-07 13:09:56 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:17:44 UTC

Description Jiří Mencák 2021-10-06 17:41:04 UTC
Created attachment 1829973 [details]
Reproducer

Description of problem:
Kubelet rejects pods that use resources that should be freed by completed pods.

Version-Release number of selected component (if applicable):
4.9.0-0.ci-2021-10-06-085105

How reproducible:
Always

Steps to Reproduce:
1. Create an SNO cluster or a cluster with 1 single worker node. Use 8 vCPU node worker (or adjust the attached reproducer complete.sh script).
2. run the ./complete.sh reproducer script
3. reboot the worker node

Actual results:
The pods that were running prior the reboot will end up in OutOfcpu state.

$ oc get po
NAME        READY   STATUS      RESTARTS   AGE
complete1   0/1     Completed   0          100m
complete2   0/1     Completed   0          100m
complete3   0/1     Completed   0          100m
complete4   0/1     Completed   0          100m
complete5   0/1     Completed   0          100m
complete6   0/1     Completed   0          100m
complete7   0/1     OutOfcpu    0          100m
complete8   0/1     OutOfcpu    0          100m
running1    0/1     OutOfcpu    0          100m
running2    0/1     OutOfcpu    0          100m
running3    0/1     OutOfcpu    0          100m
running4    0/1     OutOfcpu    0          100m
running5    0/1     OutOfcpu    0          100m


Expected results:
The pods that were running prior the reboot will be Running again.

Additional info:
https://bugzilla.redhat.com/show_bug.cgi?id=1997657#c34

Comment 3 Elana Hashman 2021-10-06 18:51:49 UTC
I have been able to reproduce this on an upstream single node local-up-cluster.sh - filed https://github.com/kubernetes/kubernetes/issues/105523

Comment 8 Elana Hashman 2021-10-07 23:52:36 UTC
Upstream fix PR: https://github.com/kubernetes/kubernetes/pull/105527

E2E test that verifies behaviour is broken on HEAD: https://github.com/kubernetes/kubernetes/pull/105552

Cherry-pick to verify behaviour is working on 1.21: https://github.com/kubernetes/kubernetes/pull/105553

Test name is "[sig-node] Restart [Serial] [Slow] [Disruptive] Kubelet should correctly account for terminated pods after restart" and is part of the node serial suite.


We are waiting to get final CI results back before LGTM/approval. Only the first PR should merge, and then backported to 1.22.

I created a "/test pull-kubernetes-node-kubelet-serial-122" job for testing this against the 1.22 branch.

Comment 11 Weinan Liu 2021-10-08 08:11:57 UTC
Issue not fixed on 4.10.0-0.nightly-2021-10-08-050801, which is still in Ready state(not Accepted) at 2021-10-08T05:08:01Z.
It may not have the PR included. Waiting for next build to check.
oc get po
NAME        READY   STATUS      RESTARTS   AGE
complete1   0/1     Completed   0          3m56s
complete2   0/1     Completed   0          3m51s
complete3   0/1     Completed   0          3m46s
complete4   0/1     Completed   0          3m40s
complete5   0/1     Completed   0          3m37s
complete6   0/1     Completed   0          3m32s
complete7   0/1     Completed   0          3m26s
complete8   0/1     Completed   0          3m21s
running1    1/1     Running     1          3m16s
running2    0/1     OutOfcpu    0          3m15s
running3    0/1     OutOfcpu    0          3m14s
running4    0/1     OutOfcpu    0          3m13s
running5    0/1     OutOfcpu    0          3m12s
running6    0/1     OutOfcpu    0          3m11s

Comment 12 Jiří Mencák 2021-10-08 08:36:47 UTC
Hey Weinan, the fix is not in 4.10.0-0.nightly-2021-10-08-050801 yet, thanks for checking!  One way to check is git log for openshift/kubernetes and compare it with kubelet version.  You need commit equal or higher than 931224322c58da67eb8b3e9d4d3ff0e7dbf81cf2.

You can get the kubelet version by checking the output of
$ oc get no
NAME                                                    STATUS   ROLES    AGE     VERSION
jmencak-fxfd2-master-0.c.openshift-gce-devel.internal   Ready    master   5m27s   v1.22.1+4d7e196
jmencak-fxfd2-master-1.c.openshift-gce-devel.internal   Ready    master   5m39s   v1.22.1+4d7e196
jmencak-fxfd2-master-2.c.openshift-gce-devel.internal   Ready    master   5m40s   v1.22.1+4d7e196

4d7e196 indicates the kubelet version that doesn't have the fix.

Comment 13 Weinan Liu 2021-10-08 10:24:39 UTC
NAME                                         STATUS     ROLES    AGE   VERSION
ip-10-0-128-201.us-east-2.compute.internal   Ready      worker   48m   v1.22.1+4d7e196
ip-10-0-142-115.us-east-2.compute.internal   Ready      master   57m   v1.22.1+4d7e196
ip-10-0-165-183.us-east-2.compute.internal   Ready      master   58m   v1.22.1+4d7e196
ip-10-0-206-28.us-east-2.compute.internal    Ready      master   57m   v1.22.1+4d7e196
 oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-08-090421   True        False         35m     Cluster version is 4.10.0-0.nightly-2021-10-08-090421

@Jiří, thanks!
4.10.0-0.nightly-2021-10-08-090421 does not have the fix yet.

Comment 15 Ryan Phillips 2021-10-12 19:27:45 UTC
*** Bug 2005647 has been marked as a duplicate of this bug. ***

Comment 16 Ryan Phillips 2021-10-13 14:32:03 UTC
*** Bug 2009092 has been marked as a duplicate of this bug. ***

Comment 17 Weinan Liu 2021-10-19 02:30:48 UTC
verified to be fixed
$ ./complete.sh
pod/complete1 created
"Pending"
"Pending"
"Succeeded"
pod/complete2 created
"Pending"
"Pending"
"Succeeded"
pod/complete3 created
"Pending"
"Pending"
"Succeeded"
pod/complete4 created
"Pending"
"Pending"
"Pending"
"Succeeded"
pod/complete5 created
"Pending"
"Pending"
"Succeeded"
pod/complete6 created
"Pending"
"Pending"
"Succeeded"
pod/complete7 created
"Pending"
"Pending"
"Succeeded"
pod/complete8 created
"Pending"
"Pending"
"Succeeded"
pod/running1 created
pod/running2 created
pod/running3 created
pod/running4 created
pod/running5 created
pod/running6 created
$ oc get po
NAME        READY   STATUS      RESTARTS   AGE
complete1   0/1     Completed   0          6m29s
complete2   0/1     Completed   0          6m23s
complete3   0/1     Completed   0          6m17s
complete4   0/1     Completed   0          6m11s
complete5   0/1     Completed   0          6m4s
complete6   0/1     Completed   0          5m58s
complete7   0/1     Completed   0          5m52s
complete8   0/1     Completed   0          5m46s
running1    1/1     Running     0          5m40s
running2    1/1     Running     0          5m39s
running3    0/1     Pending     0          5m38s
running4    0/1     Pending     0          5m37s
running5    0/1     Pending     0          5m36s
running6    0/1     Pending     0          5m35s
$ oc get no
NAME                                 STATUS   ROLES           AGE   VERSION
ci-ln-xxnd56k-f76d1-dx979-master-0   Ready    master,worker   26m   v1.22.1+9312243
[weinliu@rhel8 verification-tests]$ oc debug node/ci-ln-xxnd56k-f76d1-dx979-master-0
Starting pod/ci-ln-xxnd56k-f76d1-dx979-master-0-debug ...
To use host binaries, run `chroot /host`

chroot /host
Pod IP: 10.0.0.3
If you don't see a command prompt, try pressing enter.

sh-4.4# chroot /host
sh-4.4# reboot

Removing debug pod ...
error: unable to delete the debug pod "ci-ln-xxnd56k-f76d1-dx979-master-0-debug": Delete https://api.ci-ln-xxnd56k-f76d1.origin-ci-int-gce.dev.openshift.com:6443/api/v1/namespaces/default/pods/ci-ln-xxnd56k-f76d1-dx979-master-0-debug: unexpected EOF

[weinliu@rhel8 verification-tests]$ oc get no
NAME                                 STATUS   ROLES           AGE   VERSION
ci-ln-xxnd56k-f76d1-dx979-master-0   Ready    master,worker   29m   v1.22.1+9312243
$ oc get po
NAME                                       READY   STATUS              RESTARTS   AGE
ci-ln-xxnd56k-f76d1-dx979-master-0-debug   0/1     Completed           1          4m17s
complete1                                  0/1     Completed           0          11m
complete2                                  0/1     Completed           0          11m
complete3                                  0/1     Completed           0          10m
complete4                                  0/1     Completed           0          10m
complete5                                  0/1     Completed           0          10m
complete6                                  0/1     Completed           0          10m
complete7                                  0/1     Completed           0          10m
complete8                                  0/1     Completed           0          10m
running1                                   0/1     ContainerCreating   1          10m
running2                                   0/1     ContainerCreating   1          10m
running3                                   0/1     Pending             0          10m
running4                                   0/1     Pending             0          10m
running5                                   0/1     Pending             0          10m
running6                                   0/1     Pending             0          10m

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2021-10-16-173656   True        False         14m     Cluster version is 4.10.0-0.nightly-2021-10-16-173656

Comment 20 errata-xmlrpc 2022-03-10 16:17:19 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.