Bug 1986452 - Increase in RSS memory in CRI-O
Summary: Increase in RSS memory in CRI-O
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.9
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 4.9.0
Assignee: Peter Hunt
QA Contact: MinLi
URL:
Whiteboard: aos-scalability-49
: 2014432 (view as bug list)
Depends On:
Blocks: 2002805
TreeView+ depends on / blocked
 
Reported: 2021-07-27 15:03 UTC by Raul Sevilla
Modified: 2024-12-20 20:33 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-18 17:42:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 5245 0 None None None 2021-08-25 14:15:13 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:42:28 UTC

Description Raul Sevilla 2021-07-27 15:03:45 UTC
Hi,

During our performance benchmarks, we've found out a remarkable increase in the Resident Set Size memory usage by kubelet and CRI-O components.



4.8 report shows a max RSS memory of 667 MiB and 572 MiB from CRI-O and Kubelet respectively

Link: http://grafana.rdu2.scalelab.redhat.com:3000/d/hIBqKNvMz/kube-burner-report?orgId=1&from=1617632003313&to=1617634683027&var-Datasource=ripsaw-kube-burner-internal-production&var-sdn=openshift-sdn&var-job=node-density-heavy&var-uuid=809a6654-96dd-42ea-b899-50c71836ae7f&var-master=All&var-worker=ip-10-0-129-34.us-west-2.compute.internal&var-infra=ip-10-0-132-49.us-west-2.compute.internal&var-namespace=All&theme=light




4.9.0-0.nightly-2021-07-21-021707 report 1255 MiB and 961 MiB from CRI-O and Kubelet respectively (Which is much higher than the default SYSTEM_RESERVED_MEMORY of 1GiB configured in kubelet)

link -> http://grafana.rdu2.scalelab.redhat.com:3000/d/hIBqKNvMz/kube-burner-report?orgId=1&from=1627381755305&to=1627383650780&var-Datasource=ripsaw-kube-burner-public-production&var-sdn=All&var-job=node-density-heavy&var-uuid=a38939a7-cb8f-4466-9d16-9a9dc3d94d87&var-master=All&var-worker=ip-10-0-129-34.us-west-2.compute.internal&var-infra=ip-10-0-138-187.us-west-2.compute.internal&var-namespace=All



The scale test executed was node-density-heavy on a 25 worker node cluster, find more info about this test at https://github.com/cloud-bulldozer/e2e-benchmarking/tree/master/workloads/kube-burner#node-density-and-node-density-heavy-variables



After some investigation I found the following:

CRI-O RSS memory reported in one of the worker nodes was 982484 KiB here

sh-4.4# ps -p 1392 -o pid,rss,vsz,cmd 
    PID   RSS    VSZ CMD
   1392 982484 8128132 /usr/bin/crio


# Analyze heap pprof

sh-4.4# curl --unix-socket /var/run/crio/crio.sock http://localhost/debug/pprof/heap > heap

sh-4.4# go tool pprof heap 
File: crio
Build ID: a31a3de0d4273355834c5a76b6d82a1f63f20d2e
Type: inuse_space
Time: Jul 27, 2021 at 2:55pm (UTC)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top
Showing nodes accounting for 279.99MB, 91.79% of 305.02MB total
Dropped 126 nodes (cum <= 1.53MB)
Showing top 10 nodes out of 91
      flat  flat%   sum%        cum   cum%
  158.51MB 51.97% 51.97%   158.51MB 51.97%  fmt.Sprintf
   60.71MB 19.90% 71.87%    60.71MB 19.90%  github.com/cri-o/cri-o/vendor/github.com/opencontainers/runtime-tools/generate.(*Generator).addEnv
      12MB  3.93% 75.80%       12MB  3.93%  github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.watchpoint.Add
   11.50MB  3.77% 79.57%    11.50MB  3.77%  github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.newnode (inline)
    8.14MB  2.67% 82.24%    19.64MB  6.44%  github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.node.addchild
    7.50MB  2.46% 84.70%     7.50MB  2.46%  strings.(*Builder).grow
    6.50MB  2.13% 86.83%       13MB  4.26%  github.com/cri-o/cri-o/vendor/github.com/tchap/go-patricia/patricia.NewTrie
    6.50MB  2.13% 88.97%     6.50MB  2.13%  github.com/cri-o/cri-o/vendor/github.com/tchap/go-patricia/patricia.newSparseChildList (inline)
    4.50MB  1.48% 90.44%     4.50MB  1.48%  path.Join
    4.13MB  1.35% 91.79%     4.13MB  1.35%  github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.(*nonrecursiveTree).walkWatchpoint


According to the heap memory profile above, the actual memory usage from CRI-O was 305.02MiB, however the OS is reporting a RSS of more than 959MiB. It seems like this difference is not being released back to the OS. (Same behaviour also happens in kubelet)

The environment variables this CRI-O process is using are the following

sh-4.4# cat /proc/1392/environ | tr '\0' '\n'
GODEBUG=x509ignoreCN=0,madvdontneed=1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
NOTIFY_SOCKET=/run/systemd/notify
INVOCATION_ID=86d42f7cd2d14bcd890927d2ce43f977
JOURNAL_STREAM=9:32032
GOTRACEBACK=crash
ENABLE_PROFILE_UNIX_SOCKET=true

Comment 1 Peter Hunt 2021-07-28 13:29:56 UTC
I have some suspicions of where this regression came from, but it would be helpful to confirm. Can you try this test with the latest 4.8 nightly (4.8.2 should work as well).

Comment 6 Peter Hunt 2021-08-25 14:15:14 UTC
I believe the CRI-O portion of this bug will be addressed with the attached PR. I will need help verifying that, though.

Comment 9 MinLi 2021-09-02 10:10:32 UTC
Hi, Raul Sevilla
Can you help to check the data of 4.9 nightly build in recent 2 days? such as 4.9.0-0.nightly-2021-09-01-193941 or 4.9.0-0.nightly-2021-08-31-171539.
Thanks

Comment 10 Sunil Choudhary 2021-09-02 10:42:05 UTC
I did a comparison between 4.8 and 4.9 nightly. Memory usage is in similar range.

$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.8.0-0.nightly-2021-09-01-001821   True        False         63m     Cluster version is 4.8.0-0.nightly-2021-09-01-001821

$ oc get nodes
NAME                                          STATUS   ROLES    AGE   VERSION
ip-10-0-130-99.ap-south-1.compute.internal    Ready    worker   86m   v1.21.1+9807387
ip-10-0-135-192.ap-south-1.compute.internal   Ready    master   96m   v1.21.1+9807387
ip-10-0-170-110.ap-south-1.compute.internal   Ready    master   96m   v1.21.1+9807387
ip-10-0-180-143.ap-south-1.compute.internal   Ready    worker   87m   v1.21.1+9807387
ip-10-0-218-23.ap-south-1.compute.internal    Ready    master   96m   v1.21.1+9807387
ip-10-0-219-43.ap-south-1.compute.internal    Ready    worker   87m   v1.21.1+9807387

$ oc debug node/ip-10-0-130-99.ap-south-1.compute.internal
Starting pod/ip-10-0-130-99ap-south-1computeinternal-debug ...
...

sh-4.4# ps -p 1317 -o pid,rss,vsz,cmd
    PID   RSS    VSZ CMD
   1317 110240 2023968 /usr/bin/crio

sh-4.4# ps -p 1354 -o pid,rss,vsz,cmd
    PID   RSS    VSZ CMD
   1354 160648 1940516 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --co


$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-09-01-193941   True        False         125m    Cluster version is 4.9.0-0.nightly-2021-09-01-193941

$ oc get nodes
NAME                                          STATUS   ROLES    AGE    VERSION
ip-10-0-134-113.ap-south-1.compute.internal   Ready    worker   157m   v1.22.0-rc.0+bbcc9ae
ip-10-0-136-105.ap-south-1.compute.internal   Ready    master   168m   v1.22.0-rc.0+bbcc9ae
ip-10-0-170-97.ap-south-1.compute.internal    Ready    master   168m   v1.22.0-rc.0+bbcc9ae
ip-10-0-183-52.ap-south-1.compute.internal    Ready    worker   157m   v1.22.0-rc.0+bbcc9ae
ip-10-0-193-186.ap-south-1.compute.internal   Ready    worker   156m   v1.22.0-rc.0+bbcc9ae
ip-10-0-201-41.ap-south-1.compute.internal    Ready    master   168m   v1.22.0-rc.0+bbcc9ae

$ oc debug node/ip-10-0-134-113.ap-south-1.compute.internal
Starting pod/ip-10-0-134-113ap-south-1computeinternal-debug ...
...

sh-4.4# ps -p 1295 -o pid,rss,vsz,cmd 
    PID   RSS    VSZ CMD
   1295 111392 2028600 /usr/bin/crio

sh-4.4# ps -p 1324 -o pid,rss,vsz,cmd 
    PID   RSS    VSZ CMD
   1324 210332 2057152 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfi

Comment 11 Raul Sevilla 2021-09-03 12:00:37 UTC
(In reply to Peter Hunt from comment #6)
> I believe the CRI-O portion of this bug will be addressed with the attached
> PR. I will need help verifying that, though.

I have done some scale tests with a cluster using the PR you mention, I've seen some improvement (Top cri-o consuming pod was 1.132 GiB), however is still much higher compared with what reported in 4.8

On the other hand, while investigating this issue I realized that we did some changes to our workload (node-density-heavy) on 14 April, those changes consisted of adding an extra liveness probe to some of the pods deployed by this workload, that explains the increase of resource usage in kubelet -> https://github.com/cloud-bulldozer/benchmark-operator/commit/550ac7ec4107181f951cb31aa6a0a0138a884f78, I have done some tests w/o this code change and kubelet resident size usage is similar to 4.8 results

I'm still investigating in what point (release) this RSS increase in cri-o happened.

Comment 14 errata-xmlrpc 2021-10-18 17:42:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 15 Peter Hunt 2021-11-02 17:30:06 UTC
*** Bug 2014432 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.