Bug 1884049 - [ovn-controller] memory utilization high across all worker nodes
Summary: [ovn-controller] memory utilization high across all worker nodes
Keywords:
Status: CLOSED DUPLICATE of bug 1859924
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.6
Hardware: All
OS: All
medium
medium
Target Milestone: ---
: 4.7.0
Assignee: Anil Vishnoi
QA Contact: Anurag saxena
URL:
Whiteboard: aos-scalability-46
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-09-30 20:33 UTC by Joe Talerico
Modified: 2020-11-18 20:02 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-11-18 20:02:23 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Joe Talerico 2020-09-30 20:33:37 UTC
Description of problem:
ovn-controller across all the worker nodes is experiencing high memory utilization.

Size of the deployment and objects :

root@ip-172-31-68-73: ~/e2e-benchmarking/workloads/network-perf # oc get services -A | wc -l
3005
root@ip-172-31-68-73: ~/e2e-benchmarking/workloads/network-perf # oc get nodes | grep Ready | wc -l
107
root@ip-172-31-68-73: ~/e2e-benchmarking/workloads/network-perf # oc get pods -A | wc -l
14738

ovn-controller utilization after test[1].

Version-Release number of selected component (if applicable):
OCP4.6-Nightly

How reproducible:
N/A

Steps to Reproduce:
1. 100 Worker node OCP4.6 cluster
2. Mastervertical 1000 test (1000 projets)


Actual results:
[1] https://snapshot.raintank.io/dashboard/snapshot/A9m9EVRSvBedYqPbNPWMjbRyQAclzH0Q?orgId=2

Expected results:


Additional info:
pprof data https://coreos.slack.com/archives/CU9HKBZKJ/p1601486618158900

Comment 1 Ben Bennett 2020-10-01 15:46:48 UTC
This is a good candidate for a 4.6.z backport once resolved.

Comment 2 Dan Williams 2020-10-12 15:22:02 UTC
@avishnoi is this related to the reject ACL flow explosion?

Comment 3 Anil Vishnoi 2020-11-11 06:46:35 UTC
(In reply to Dan Williams from comment #2)
> @avishnoi is this related to the reject ACL flow explosion?

Part of it, but numan's patches of reducing the number of flows would help here. So once we have all those patches and acl patches in our nightly, we need to test this again to see the ovn-controller memory consumption. Currently i believe this high memory consumption is because of the number flows installed on individual worker node (around 2M).

Comment 4 Tim Rozet 2020-11-18 20:02:23 UTC
SBDB and openflow reduction are part of 1859924. Duping this bug to that and if you see it again after 1859924 is resolved please reopen.

*** This bug has been marked as a duplicate of bug 1859924 ***


Note You need to log in before you can comment on or make changes to this bug.