Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.

Bug 2189981

Summary: [OCP 4.14] northd high CPU usage in OVN-IC
Product: Red Hat Enterprise Linux Fast Datapath Reporter: anil venkata <vkommadi>
Component: OVNAssignee: OVN Team <ovnteam>
Status: CLOSED CURRENTRELEASE QA Contact: Jianlin Shi <jishi>
Severity: unspecified Docs Contact:
Priority: high    
Version: FDP 23.KCC: ctrautma, dcbw, jiji, mmichels
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-10-13 14:23:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description anil venkata 2023-04-26 16:33:22 UTC
Description of problem:
While running perf workloads on OCP 4.14 environment with OVN-IC patches, northd on each node is consuming high CPU.

This environment is having 24 nodes. I ran below tests on this env
1. node-density-heavy - it creates 245 pods per node on a single namespace.
2. cluster-density - it creates 500 namespaces. Each namespace with 1 build, 1 image, 5 deployments, 5 service, 1 route. Workload churning was enabled, meaning that 10% of the created objects were deleted and recreated at 10 minute intervals for 1h.

Slides https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit?usp=sharing
have detailed description of CPU & Memory consumption of OVN containers and pods.

CPU & Memory usage of OVN pods during node-density-heavy test -
ovnkube-node pod on legacy ovn CPU usage - mean 10% max 60%
ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130%
Ovn-ic pod 700 MiB vs ovkube-node pod 330 MiB

northd container on each node CPU usage - Mean: 40% Max: 110%
northd container on each node memory is only 70 MiB 
ovnkube-network-controller-manager memory 300 MiB 
whereas  ovkube-node container in legacy OVN 330 MiB

CPU & Memory usage during cluster-density test -
ovnkube-node pod on legacy ovn CPU usage - mean 6% max 60%
ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130%
Ovn-ic pod 780 MiB vs ovkube-node pod 360 MiB

northd container on each node CPU usage (ovn-ic env)- Mean: 22% Max: 100%

So northd using 1 core CPU on each node. This needs optimization.

Comment 2 anil venkata 2023-05-09 13:25:22 UTC
Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes with northd single thread. We see northd container using more than 1 core in this testing as well  https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0

Comment 3 Dan Williams 2023-05-09 14:12:21 UTC
(In reply to anil venkata from comment #2)
> Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and
> OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes
> with northd single thread. We see northd container using more than 1 core in
> this testing as well 
> https://docs.google.com/presentation/d/16NMDoF52gUb-
> MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0

Yeah, (for now) we'd still expect it to use about a core. But with 1 thread does it use *less* than it did before? And did the P99 times change and if so how?

Comment 4 anil venkata 2023-05-16 17:43:11 UTC
We observed similar CPU usage and latency in both northd single and multi thread environments https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_11
So no improvement with northd single thread testing.

Comment 5 Mark Michelson 2023-10-13 14:23:42 UTC
I was given authority to close this issue since this should be addressed by a few additions to ovn-northd:

* The northd-backoff-interval-ms option was added that allows for engine runs to have a delay between them. This reduces CPU load by allowing northd to handle more changes from the NBDB in a single execution instead of running lots of times for many small updates.
* Incremental processing has been added to much of ovn-northd, speeding up processing of certain classes of NBDB changes.