Bug 2189981

Summary: [OCP 4.14] northd high CPU usage in OVN-IC
Product: Red Hat Enterprise Linux Fast Datapath Reporter: anil venkata <vkommadi>
Component: OVNAssignee: OVN Team <ovnteam>
Status: NEW --- QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: high    
Version: FDP 23.KCC: ctrautma, dcbw, jiji, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description anil venkata 2023-04-26 16:33:22 UTC
Description of problem:
While running perf workloads on OCP 4.14 environment with OVN-IC patches, northd on each node is consuming high CPU.

This environment is having 24 nodes. I ran below tests on this env
1. node-density-heavy - it creates 245 pods per node on a single namespace.
2. cluster-density - it creates 500 namespaces. Each namespace with 1 build, 1 image, 5 deployments, 5 service, 1 route. Workload churning was enabled, meaning that 10% of the created objects were deleted and recreated at 10 minute intervals for 1h.

Slides https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit?usp=sharing
have detailed description of CPU & Memory consumption of OVN containers and pods.

CPU & Memory usage of OVN pods during node-density-heavy test -
ovnkube-node pod on legacy ovn CPU usage - mean 10% max 60%
ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130%
Ovn-ic pod 700 MiB vs ovkube-node pod 330 MiB

northd container on each node CPU usage - Mean: 40% Max: 110%
northd container on each node memory is only 70 MiB 
ovnkube-network-controller-manager memory 300 MiB 
whereas  ovkube-node container in legacy OVN 330 MiB

CPU & Memory usage during cluster-density test -
ovnkube-node pod on legacy ovn CPU usage - mean 6% max 60%
ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130%
Ovn-ic pod 780 MiB vs ovkube-node pod 360 MiB

northd container on each node CPU usage (ovn-ic env)- Mean: 22% Max: 100%

So northd using 1 core CPU on each node. This needs optimization.

Comment 2 anil venkata 2023-05-09 13:25:22 UTC
Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes with northd single thread. We see northd container using more than 1 core in this testing as well  https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0

Comment 3 Dan Williams 2023-05-09 14:12:21 UTC
(In reply to anil venkata from comment #2)
> Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and
> OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes
> with northd single thread. We see northd container using more than 1 core in
> this testing as well 
> https://docs.google.com/presentation/d/16NMDoF52gUb-
> MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0

Yeah, (for now) we'd still expect it to use about a core. But with 1 thread does it use *less* than it did before? And did the P99 times change and if so how?

Comment 4 anil venkata 2023-05-16 17:43:11 UTC
We observed similar CPU usage and latency in both northd single and multi thread environments https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_11
So no improvement with northd single thread testing.