Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2117461

Summary: [4.10 backport] percpu Memory leak CRIO due to no garbage collection in /run/crio/exits for exited containers
Product: OpenShift Container Platform Reporter: Pamela Escorza <pescorza>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: pehunt
Version: 4.10   
Target Milestone: ---   
Target Release: 4.10.z   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-21 14:10:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2117462    
Bug Blocks:    

Description Pamela Escorza 2022-08-11 04:34:00 UTC
Description of problem:
This is bug is opened to backport fix :
https://github.com/cri-o/cri-o/pull/5508

Version-Release number of selected component (if applicable):
OCP 4.10


Actual results:
Percpu memory usage is high:
$ cat proc/meminfo | awk '{print $2 "    " $1}'| sort -rn | awk '{print $1 " = " int($1/1024) "MB  -  "int($1/1024/1024)"GB   " $2}' | grep -E 'MemTotal|MemFree|Buffers|Cached|Percpu'
32897520 = 32126MB  -  31GB   MemTotal:
19215360 = 18765MB  -  18GB   Percpu:
4547680 = 4441MB  -  4GB   Cached:
449000 = 438MB  -  0GB   MemFree:
3108 = 3MB  -  0GB   Buffers:
0 = 0MB  -  0GB   SwapCached:


Expected results:
Not high usage of percpu memory.

Additional info:
child of bug https://bugzilla.redhat.com/show_bug.cgi?id=2004037#c121

Comment 8 Sunil Choudhary 2022-09-19 07:12:15 UTC
% oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.33   True        False         51m     Cluster version is 4.10.33

% oc get nodes
NAME                                         STATUS   ROLES    AGE   VERSION
ip-10-0-138-80.us-east-2.compute.internal    Ready    master   71m   v1.23.5+012e945
ip-10-0-149-187.us-east-2.compute.internal   Ready    worker   64m   v1.23.5+012e945
ip-10-0-174-51.us-east-2.compute.internal    Ready    worker   63m   v1.23.5+012e945
ip-10-0-186-140.us-east-2.compute.internal   Ready    master   70m   v1.23.5+012e945
ip-10-0-192-206.us-east-2.compute.internal   Ready    worker   65m   v1.23.5+012e945
ip-10-0-212-121.us-east-2.compute.internal   Ready    master   71m   v1.23.5+012e945
sunilc@schoudha-mac debug % oc debug node/ip-10-0-149-187.us-east-2.compute.internal


sh-4.4# grep Per /proc/meminfo
Percpu:             3336 kB

sh-4.4# cat /proc/cgroups | column -t
#subsys_name  hierarchy  num_cgroups  enabled
cpuset        4          100          1
cpu           2          434          1
cpuacct       2          434          1
blkio         7          434          1
memory        11         458          1
devices       9          432          1
freezer       12         100          1
net_cls       3          100          1
perf_event    5          100          1
net_prio      3          100          1
hugetlb       6          100          1
pids          8          434          1
rdma          10         100          1

sh-4.4# while :; do podman run --name=test1 --replace centos /bin/echo 'running'; done
Resolved "centos" as an alias (/etc/containers/registries.conf.d/000-shortnames.conf)
Trying to pull quay.io/centos/centos:latest...
Getting image source signatures
Copying blob 7a0437f04f83 done  
Copying config 300e315adb done  
Writing manifest to image destination
Storing signatures
running
e8e6280d9cb638594e0cd72c471bf66ba3f08003ff1e2fbd002682af633a2bfc
….


sh-4.4# grep Per /proc/meminfo
Percpu:             3440 kB

sh-4.4# cat /proc/cgroups | column -t
#subsys_name  hierarchy  num_cgroups  enabled
cpuset        4          102          1
cpu           2          443          1
cpuacct       2          443          1
blkio         7          443          1
memory        11         517          1
devices       9          441          1
freezer       12         102          1
net_cls       3          102          1
perf_event    5          102          1
net_prio      3          102          1
hugetlb       6          102          1
pids          8          443          1
rdma          10         102          1

Comment 10 errata-xmlrpc 2022-09-21 14:10:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 4.10.33 bug fix update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:6532