Bug 2065749 - Kubelet slowly leaking memory and pods eventually unable to start
Summary: Kubelet slowly leaking memory and pods eventually unable to start
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.11.0
Assignee: Ryan Phillips
QA Contact: Sunil Choudhary
URL:
Whiteboard:
Depends On:
Blocks: 2106414
TreeView+ depends on / blocked
 
Reported: 2022-03-18 15:53 UTC by Luke Stanton
Modified: 2023-09-18 04:33 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: Memory leak in container management code within Kubelet. Consequence: Fix: Code change Result: Memory is no longer leaking on container cleanup within kubelet container management code.
Clone Of:
Environment:
Last Closed: 2022-08-10 10:54:40 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift kubernetes pull 1229 0 None Merged Bug 2065749: UPSTREAM: 109103: cpu/memory manager containerMap memory leak 2022-07-12 12:23:57 UTC
Red Hat Product Errata RHSA-2022:5069 0 None None None 2022-08-10 10:55:14 UTC

Internal Links: 2052378

Description Luke Stanton 2022-03-18 15:53:35 UTC
Description of problem:

Over time the kubelet slowly consumes memory until, at some point, pods are no longer able to start on the node; coinciding with this are container runtime errors. It appears that even rebooting the node does not resolve the issue once it occurs - the node has to be completely rebuilt.



How reproducible: Consistently



Actual results: Pods are eventually unable to start on the node; rebuilding the node is the only workaround



Expected results: kubelet/crio would continue working as expected

Comment 17 Sunil Choudhary 2022-06-15 15:21:28 UTC
Checked on 4.11.0-0.nightly-2022-06-14-172335 by running pods over a day and don't see unexpectedly high memory usage by kubelet on node.

% oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.11.0-0.nightly-2022-06-14-172335   True        False         8h      Cluster version is 4.11.0-0.nightly-2022-06-14-172335

Comment 22 errata-xmlrpc 2022-08-10 10:54:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:5069

Comment 24 W. Trevor King 2022-12-06 05:22:12 UTC
If you click "Show advanced fields" on this bug, you can see that it blocks bug 2106414, which shipped in 4.10.23 [1].  And bug 2106414 blocks bug 2106655, which shipped in 4.9.45.  And from there tracking hopped to Jira [3], with a fix shipping in 4.8.51 [4].

[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2106414#c5
[2]: https://bugzilla.redhat.com/show_bug.cgi?id=2106655#c7
[3]: https://issues.redhat.com//browse/OCPBUGS-1461
[4]: https://access.redhat.com/errata/RHSA-2022:6801

Comment 26 Red Hat Bugzilla 2023-09-18 04:33:50 UTC
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days


Note You need to log in before you can comment on or make changes to this bug.