Description of problem: 1. On a 10 node cluster with no user projects, just OOTB pods/services, ovn-controller uses ~115MB of RSS memory 2. On a 10 node cluster with 100 user projects containing 200 pods and services (20 per node for each), ovn-controller uses ~150MB of RSS 3. On a 100 node cluster with no user projects, just OOTB pods/services, ovn-kubernetes uses ~760MB of RSS memory, roughly 7x step #1 4. On a 100 node cluster with 1000 user projects containing 2000 pods and services (20 per node for each, as in #2), ovn-controller uses 3.3 GB of RSS, roughly 20x step #2 The implication is that larger instance sizes are required to run the exact same node workload on OVN clusters with more nodes. In this above test on AWS, m5.large instances worked fine in a 10 node cluster for 20 pods + 20 services per node but went NotReady and OOMed due to ovn-controller memory growth for the same workload in a 100 node cluster. Instance sizes had to be doubled to run the workload successfully. Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2020-12-04-013308 How reproducible: Always Steps to Reproduce: 1. Create a 10 node cluster in AWS with 3 m5.4xlarge masters and 10 m5.large nodes 2. Create 100 projects each with 2 deployments, each with 1 pod + 1 svc. 20 pods + 20 services/node. Note the ovn-controller memory size on a few nodes. 3. Scale the cluster up to 100 m5.large nodes 4. Create 1000 projects each with 2 deployments, each with 1 pod + 1 svc. 20 pods + 20 services/node. Actual results: Nodes will start going NotReady and become unresponsive. Nodes that are still responsive will show ovn-controller memory usage in excess of 3.2GB Expected results: ovn-controller memory usage on a node grows in proportion to the workload on the node, not the number of nodes in the cluster. A node that can handle 20 pods and services at 10 node scale can handle the same workload at 100 node scale without ovn-controller requiring 20x the memory. Additional info: Let me know if must-gather would help or what other logs you might need.
Deleting the 1000 projects, 2000 pods/services caused ovn-controller usage to go up to 4.5Gi - 5Gi RSS Re-creating the projects caused it to go up again to ~6.2GB
Mike, can you attach the southbound DB and the OVS flow dumps (ovs-vsctl dump-flows br-int) when the ovn-controller memory usage grows > 700Mb RSS?
When you notice the huge memory usage in ovn-controller, can you please run the below command and see if it reduces the memory ? ovs-vsctl set open . external_ids:ovn-enable-lflow-cache=false Please give some time like a minute or two. When you run the above command, ovn-controller will disable the caching and recompute all the logical flows. Thanks
Will repro today and gather info requested in comment 2 and comment 3
If disabling the cache addresses this issue, I thnk we can close this BZ. I raised another BZ to add the option to configure the cache limit - https://bugzilla.redhat.com/show_bug.cgi?id=1906033
Tried running ovs-vsctl set open . external_ids:ovn-enable-lflow-cache=false again after deleting projcts/pods/svcs and ovn-controller RSS remained unchanged. I am running the above command on a node I am watching - let me know if I should be doing it somewhere else.
Hi Numan, As I was doing some tests in a 100 node cluster with the RPMs you mentioned I also took a look at this BZ. As you suggested, I configured "ovs-vsctl set open . external_ids:ovn-enable-lflow-cache=false" in only a ovn-controller pod and restarted it, then I generated some objects (500 services + 500 namespaces + 2500 pods): root@ip-172-31-71-55: /tmp # oc get node | grep -c worker 100 root@ip-172-31-71-55: /tmp # oc get ns | wc -l 551 root@ip-172-31-71-55: /tmp # oc get pod -A | wc -l 3849 root@ip-172-31-71-55: /tmp # oc get svc -A | wc -l 566 # This one is the pod disabled lflow caching root@ip-172-31-71-55: ~/e2e-benchmarking/workloads/kube-burner # oc rsh ovnkube-node-wvmg6 Defaulting container name to ovn-controller. sh-4.4# ovs-vsctl get open . external_ids:ovn-enable-lflow-cache "false" # Pods with root@ip-172-31-71-55: ~/e2e-benchmarking/workloads/kube-burner # oc adm top pods -l app=ovnkube-node | sort -k3 -r | tail -10 ovnkube-node-vzfx7 4m 755Mi ovnkube-node-t7bns 3m 755Mi ovnkube-node-q2fh9 4m 754Mi ovnkube-node-m88km 4m 754Mi ovnkube-node-qh9wl 4m 753Mi ovnkube-node-plzjj 3m 753Mi ovnkube-node-knj6x 3m 753Mi ovnkube-node-86s8j 4m 753Mi ovnkube-node-xqs5s 2m 751Mi ovnkube-node-wvmg6 2m 395Mi Container breakdown: root@ip-172-31-71-55: ~/e2e-benchmarking/workloads/kube-burner # oc adm top pods ovnkube-node-wvmg6 --containers POD NAME CPU(cores) MEMORY(bytes) ovnkube-node-wvmg6 kube-rbac-proxy 0m 16Mi ovnkube-node-wvmg6 ovn-controller 0m 331Mi ovnkube-node-wvmg6 ovnkube-node 4m 47Mi And the second with lower usage: root@ip-172-31-71-55: ~/e2e-benchmarking/workloads/kube-burner # oc adm top pods ovnkube-node-xqs5s --containers POD NAME CPU(cores) MEMORY(bytes) ovnkube-node-xqs5s kube-rbac-proxy 0m 15Mi ovnkube-node-xqs5s ovn-controller 0m 683Mi ovnkube-node-xqs5s ovnkube-node 4m 51Mi I can confirm a memory usage reduction after disabling lflow caching. However, we still have to quantify side effects like a higher CPU usage and higher latency from ovn-controller.
Closing this BZ as https://bugzilla.redhat.com/show_bug.cgi?id=1906033 - which adds the support to limit lflow cache will handle the memory usage issue.