Description of problem: ovn-controller across all the worker nodes is experiencing high memory utilization. Size of the deployment and objects : root@ip-172-31-68-73: ~/e2e-benchmarking/workloads/network-perf # oc get services -A | wc -l 3005 root@ip-172-31-68-73: ~/e2e-benchmarking/workloads/network-perf # oc get nodes | grep Ready | wc -l 107 root@ip-172-31-68-73: ~/e2e-benchmarking/workloads/network-perf # oc get pods -A | wc -l 14738 ovn-controller utilization after test[1]. Version-Release number of selected component (if applicable): OCP4.6-Nightly How reproducible: N/A Steps to Reproduce: 1. 100 Worker node OCP4.6 cluster 2. Mastervertical 1000 test (1000 projets) Actual results: [1] https://snapshot.raintank.io/dashboard/snapshot/A9m9EVRSvBedYqPbNPWMjbRyQAclzH0Q?orgId=2 Expected results: Additional info: pprof data https://coreos.slack.com/archives/CU9HKBZKJ/p1601486618158900
This is a good candidate for a 4.6.z backport once resolved.
@avishnoi is this related to the reject ACL flow explosion?
(In reply to Dan Williams from comment #2) > @avishnoi is this related to the reject ACL flow explosion? Part of it, but numan's patches of reducing the number of flows would help here. So once we have all those patches and acl patches in our nightly, we need to test this again to see the ovn-controller memory consumption. Currently i believe this high memory consumption is because of the number flows installed on individual worker node (around 2M).
SBDB and openflow reduction are part of 1859924. Duping this bug to that and if you see it again after 1859924 is resolved please reopen. *** This bug has been marked as a duplicate of bug 1859924 ***