Description of problem:
On 4.10 OpenShift cluster with ~250 worker nodes, we launched around 35-60 pods per node. All the ovn-controller instances across the cluster seems be utilizing a reasonable amount of memory except two nodes. Once the pods are launched, they are not deleted and the cluster is in a steady state.
Over time their memory keeps increasing without bounds and without any activity happening in the cluster. I am suspecting a memory leak here.
Looking at memory stats on one of the problem nodes we see
[root@worker000-fc640 ~]# ovn-appctl -t ovn-controller memory/show
lflow-cache-entries-cache-expr:16030 lflow-cache-entries-cache-matches:9621 lflow-cache-size-KB:65688 ofctrl_desired_flow_usage-KB:37379 ofctrl_installed_flow_usage-KB:28470 ofctrl_sb_flow_ref_usage-KB:9361
Comparing it a node which is NOT exhibiting this leak, I don't spot many differences
Node without leak:
sh-4.4# ovn-appctl -t ovn-controller memory/show
lflow-cache-entries-cache-expr:15804 lflow-cache-entries-cache-matches:9547 lflow-cache-size-KB:64684 ofctrl_desired_flow_usage-KB:37304 ofctrl_installed_flow_usage-KB:28425 ofctrl_sb_flow_ref_usage-KB:9326
Version-Release number of selected component (if applicable):
[kni@e16-h12-b02-fc640 web-burner]$ oc rsh -c ovn-controller ovnkube-node-qj8dq
sh-4.4# rpm -qa | grep ovn
Only reproducible on some nodes
Steps to Reproduce:
1. Deploy a large cluster
2. Launch a few pods
3. remain at steady state and watch ovn-controller memory grow boundlessly on some nodes
ovn-controller memory on some nodes keeps growing without bounds indicating a memory leak
Memory should be within reasonable bounds and not grow at steady state
Placed DBS and provided Numan access to those.
Created attachment 1830071 [details]
DBs and conf.db of worker node
perf record output shows pinctrl0 thread is hot on CPu
Event count (approx.): 6723064170
# Overhead Command Shared Object Symbol
# ........ .............. ................... ...........................................
1.97% ovn_pinctrl0 libpthread-2.28.so [.] __pthread_rwlock_wrlock
1.84% ovn_pinctrl0 libpthread-2.28.so [.] __pthread_rwlock_rdlock
1.77% ovn_pinctrl0 libpthread-2.28.so [.] __pthread_rwlock_unlock
1.75% ovn_pinctrl0 [kernel.kallsyms] [k] copy_user_enhanced_fast_string
1.62% ovn_pinctrl0 libc-2.28.so [.] _int_malloc
1.53% ovn_pinctrl0 [kernel.kallsyms] [k] avc_has_perm
1.16% ovn_pinctrl0 [kernel.kallsyms] [k] _raw_spin_lock
1.09% ovn_pinctrl0 libc-2.28.so [.] malloc
0.96% ovn_pinctrl0 libc-2.28.so [.] __memmove_avx_unaligned_erms
0.96% ovn_pinctrl0 libc-2.28.so [.] _int_free
0.80% ovn_pinctrl0 ovn-controller [.] 0x00000000000b87c1
0.75% ovn_pinctrl0 [kernel.kallsyms] [k] copy_user_generic_unrolled
0.71% ovn_pinctrl0 libc-2.28.so [.] __memset_avx2_unaligned_erms
0.69% ovn_pinctrl0 libpthread-2.28.so [.] __pthread_enable_asynccancel
0.69% ovn_pinctrl0 libc-2.28.so [.] __memcmp_avx2_movbe
0.67% ovn_pinctrl0 ovn-controller [.] 0x000000000011c8fd
0.65% ovn_pinctrl0 [kernel.kallsyms] [k] find_vma
0.61% ovn_pinctrl0 ovn-controller [.] 0x00000000000469bb
0.61% ovn_pinctrl0 [kernel.kallsyms] [k] skb_set_owner_w
0.60% ovn_pinctrl0 ovn-controller [.] 0x00000000000bdfbb
0.58% ovn_pinctrl0 ovn-controller [.] 0x00000000000b87a6
As discussed above the solution is to use OVN's meter functionality to rate-limit packet-in to ovn-controller, which would be configured by ovn-kubernetes. This should be done for BFD and chk-pkt-len at least.
Opened Upstream PR: https://github.com/ovn-org/ovn-kubernetes/pull/2752
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory (Important: OpenShift Container Platform 4.11.0 bug fix and security update), and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.