Description of problem: Over time the kubelet slowly consumes memory until, at some point, pods are no longer able to start on the node; coinciding with this are container runtime errors. It appears that even rebooting the node does not resolve the issue once it occurs - the node has to be completely rebuilt. How reproducible: Consistently Actual results: Pods are eventually unable to start on the node; rebuilding the node is the only workaround Expected results: kubelet/crio would continue working as expected
Checked on 4.11.0-0.nightly-2022-06-14-172335 by running pods over a day and don't see unexpectedly high memory usage by kubelet on node. % oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.11.0-0.nightly-2022-06-14-172335 True False 8h Cluster version is 4.11.0-0.nightly-2022-06-14-172335
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2022:5069
If you click "Show advanced fields" on this bug, you can see that it blocks bug 2106414, which shipped in 4.10.23 [1]. And bug 2106414 blocks bug 2106655, which shipped in 4.9.45. And from there tracking hopped to Jira [3], with a fix shipping in 4.8.51 [4]. [1]: https://bugzilla.redhat.com/show_bug.cgi?id=2106414#c5 [2]: https://bugzilla.redhat.com/show_bug.cgi?id=2106655#c7 [3]: https://issues.redhat.com//browse/OCPBUGS-1461 [4]: https://access.redhat.com/errata/RHSA-2022:6801
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days