Hi, During our performance benchmarks, we've found out a remarkable increase in the Resident Set Size memory usage by kubelet and CRI-O components. 4.8 report shows a max RSS memory of 667 MiB and 572 MiB from CRI-O and Kubelet respectively Link: http://grafana.rdu2.scalelab.redhat.com:3000/d/hIBqKNvMz/kube-burner-report?orgId=1&from=1617632003313&to=1617634683027&var-Datasource=ripsaw-kube-burner-internal-production&var-sdn=openshift-sdn&var-job=node-density-heavy&var-uuid=809a6654-96dd-42ea-b899-50c71836ae7f&var-master=All&var-worker=ip-10-0-129-34.us-west-2.compute.internal&var-infra=ip-10-0-132-49.us-west-2.compute.internal&var-namespace=All&theme=light 4.9.0-0.nightly-2021-07-21-021707 report 1255 MiB and 961 MiB from CRI-O and Kubelet respectively (Which is much higher than the default SYSTEM_RESERVED_MEMORY of 1GiB configured in kubelet) link -> http://grafana.rdu2.scalelab.redhat.com:3000/d/hIBqKNvMz/kube-burner-report?orgId=1&from=1627381755305&to=1627383650780&var-Datasource=ripsaw-kube-burner-public-production&var-sdn=All&var-job=node-density-heavy&var-uuid=a38939a7-cb8f-4466-9d16-9a9dc3d94d87&var-master=All&var-worker=ip-10-0-129-34.us-west-2.compute.internal&var-infra=ip-10-0-138-187.us-west-2.compute.internal&var-namespace=All The scale test executed was node-density-heavy on a 25 worker node cluster, find more info about this test at https://github.com/cloud-bulldozer/e2e-benchmarking/tree/master/workloads/kube-burner#node-density-and-node-density-heavy-variables After some investigation I found the following: CRI-O RSS memory reported in one of the worker nodes was 982484 KiB here sh-4.4# ps -p 1392 -o pid,rss,vsz,cmd PID RSS VSZ CMD 1392 982484 8128132 /usr/bin/crio # Analyze heap pprof sh-4.4# curl --unix-socket /var/run/crio/crio.sock http://localhost/debug/pprof/heap > heap sh-4.4# go tool pprof heap File: crio Build ID: a31a3de0d4273355834c5a76b6d82a1f63f20d2e Type: inuse_space Time: Jul 27, 2021 at 2:55pm (UTC) Entering interactive mode (type "help" for commands, "o" for options) (pprof) top Showing nodes accounting for 279.99MB, 91.79% of 305.02MB total Dropped 126 nodes (cum <= 1.53MB) Showing top 10 nodes out of 91 flat flat% sum% cum cum% 158.51MB 51.97% 51.97% 158.51MB 51.97% fmt.Sprintf 60.71MB 19.90% 71.87% 60.71MB 19.90% github.com/cri-o/cri-o/vendor/github.com/opencontainers/runtime-tools/generate.(*Generator).addEnv 12MB 3.93% 75.80% 12MB 3.93% github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.watchpoint.Add 11.50MB 3.77% 79.57% 11.50MB 3.77% github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.newnode (inline) 8.14MB 2.67% 82.24% 19.64MB 6.44% github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.node.addchild 7.50MB 2.46% 84.70% 7.50MB 2.46% strings.(*Builder).grow 6.50MB 2.13% 86.83% 13MB 4.26% github.com/cri-o/cri-o/vendor/github.com/tchap/go-patricia/patricia.NewTrie 6.50MB 2.13% 88.97% 6.50MB 2.13% github.com/cri-o/cri-o/vendor/github.com/tchap/go-patricia/patricia.newSparseChildList (inline) 4.50MB 1.48% 90.44% 4.50MB 1.48% path.Join 4.13MB 1.35% 91.79% 4.13MB 1.35% github.com/cri-o/cri-o/vendor/github.com/rjeczalik/notify.(*nonrecursiveTree).walkWatchpoint According to the heap memory profile above, the actual memory usage from CRI-O was 305.02MiB, however the OS is reporting a RSS of more than 959MiB. It seems like this difference is not being released back to the OS. (Same behaviour also happens in kubelet) The environment variables this CRI-O process is using are the following sh-4.4# cat /proc/1392/environ | tr '\0' '\n' GODEBUG=x509ignoreCN=0,madvdontneed=1 PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin NOTIFY_SOCKET=/run/systemd/notify INVOCATION_ID=86d42f7cd2d14bcd890927d2ce43f977 JOURNAL_STREAM=9:32032 GOTRACEBACK=crash ENABLE_PROFILE_UNIX_SOCKET=true
I have some suspicions of where this regression came from, but it would be helpful to confirm. Can you try this test with the latest 4.8 nightly (4.8.2 should work as well).
Hey Peter, find here 4.8.0-0.nightly-2021-06-02-025513 data here http://grafana.rdu2.scalelab.redhat.com:3000/d/hIBqKNvMz/kube-burner-report?orgId=1&from=1622640729984&to=1622644648232&var-Datasource=ripsaw-kube-burner-public-production&var-sdn=openshift-sdn&var-job=node-density-heavy&var-uuid=8b2f7c71-e1c9-4114-a5a9-c72d4707258e&var-master=All&var-worker=ip-10-0-129-99.us-west-2.compute.internal&var-infra=ip-10-0-152-100.us-west-2.compute.internal&var-namespace=All Memory usage is high as well 4.8.0-0.nightly-2021-05-10-092939 results are not showing such a big rss memory usage http://grafana.rdu2.scalelab.redhat.com:3000/d/hIBqKNvMz/kube-burner-report?orgId=1&from=1621009275453&to=1621013141814&var-Datasource=ripsaw-kube-burner-public-production&var-sdn=openshift-sdn&var-job=node-density-heavy&var-uuid=7ccb7d75-7a94-44e7-a1eb-32e994bf2426&var-master=All&var-worker=ip-10-0-130-169.us-west-2.compute.internal&var-infra=ip-10-0-137-125.us-west-2.compute.internal&var-namespace=All
I believe the CRI-O portion of this bug will be addressed with the attached PR. I will need help verifying that, though.
Hi, Raul Sevilla Can you help to check the data of 4.9 nightly build in recent 2 days? such as 4.9.0-0.nightly-2021-09-01-193941 or 4.9.0-0.nightly-2021-08-31-171539. Thanks
I did a comparison between 4.8 and 4.9 nightly. Memory usage is in similar range. $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.8.0-0.nightly-2021-09-01-001821 True False 63m Cluster version is 4.8.0-0.nightly-2021-09-01-001821 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-130-99.ap-south-1.compute.internal Ready worker 86m v1.21.1+9807387 ip-10-0-135-192.ap-south-1.compute.internal Ready master 96m v1.21.1+9807387 ip-10-0-170-110.ap-south-1.compute.internal Ready master 96m v1.21.1+9807387 ip-10-0-180-143.ap-south-1.compute.internal Ready worker 87m v1.21.1+9807387 ip-10-0-218-23.ap-south-1.compute.internal Ready master 96m v1.21.1+9807387 ip-10-0-219-43.ap-south-1.compute.internal Ready worker 87m v1.21.1+9807387 $ oc debug node/ip-10-0-130-99.ap-south-1.compute.internal Starting pod/ip-10-0-130-99ap-south-1computeinternal-debug ... ... sh-4.4# ps -p 1317 -o pid,rss,vsz,cmd PID RSS VSZ CMD 1317 110240 2023968 /usr/bin/crio sh-4.4# ps -p 1354 -o pid,rss,vsz,cmd PID RSS VSZ CMD 1354 160648 1940516 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfig --container-runtime=remote --co $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-09-01-193941 True False 125m Cluster version is 4.9.0-0.nightly-2021-09-01-193941 $ oc get nodes NAME STATUS ROLES AGE VERSION ip-10-0-134-113.ap-south-1.compute.internal Ready worker 157m v1.22.0-rc.0+bbcc9ae ip-10-0-136-105.ap-south-1.compute.internal Ready master 168m v1.22.0-rc.0+bbcc9ae ip-10-0-170-97.ap-south-1.compute.internal Ready master 168m v1.22.0-rc.0+bbcc9ae ip-10-0-183-52.ap-south-1.compute.internal Ready worker 157m v1.22.0-rc.0+bbcc9ae ip-10-0-193-186.ap-south-1.compute.internal Ready worker 156m v1.22.0-rc.0+bbcc9ae ip-10-0-201-41.ap-south-1.compute.internal Ready master 168m v1.22.0-rc.0+bbcc9ae $ oc debug node/ip-10-0-134-113.ap-south-1.compute.internal Starting pod/ip-10-0-134-113ap-south-1computeinternal-debug ... ... sh-4.4# ps -p 1295 -o pid,rss,vsz,cmd PID RSS VSZ CMD 1295 111392 2028600 /usr/bin/crio sh-4.4# ps -p 1324 -o pid,rss,vsz,cmd PID RSS VSZ CMD 1324 210332 2057152 kubelet --config=/etc/kubernetes/kubelet.conf --bootstrap-kubeconfig=/etc/kubernetes/kubeconfig --kubeconfig=/var/lib/kubelet/kubeconfi
(In reply to Peter Hunt from comment #6) > I believe the CRI-O portion of this bug will be addressed with the attached > PR. I will need help verifying that, though. I have done some scale tests with a cluster using the PR you mention, I've seen some improvement (Top cri-o consuming pod was 1.132 GiB), however is still much higher compared with what reported in 4.8 On the other hand, while investigating this issue I realized that we did some changes to our workload (node-density-heavy) on 14 April, those changes consisted of adding an extra liveness probe to some of the pods deployed by this workload, that explains the increase of resource usage in kubelet -> https://github.com/cloud-bulldozer/benchmark-operator/commit/550ac7ec4107181f951cb31aa6a0a0138a884f78, I have done some tests w/o this code change and kubelet resident size usage is similar to 4.8 results I'm still investigating in what point (release) this RSS increase in cri-o happened.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759
*** Bug 2014432 has been marked as a duplicate of this bug. ***