Bug 2189981 - [OCP 4.14] northd high CPU usage in OVN-IC
Summary: [OCP 4.14] northd high CPU usage in OVN-IC
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: OVN
Version: FDP 23.K
Hardware: Unspecified
OS: Unspecified
high
unspecified
Target Milestone: ---
: ---
Assignee: OVN Team
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-26 16:33 UTC by anil venkata
Modified: 2023-07-13 07:25 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2836 0 None None None 2023-04-26 16:37:18 UTC

Description anil venkata 2023-04-26 16:33:22 UTC
Description of problem:
While running perf workloads on OCP 4.14 environment with OVN-IC patches, northd on each node is consuming high CPU.

This environment is having 24 nodes. I ran below tests on this env
1. node-density-heavy - it creates 245 pods per node on a single namespace.
2. cluster-density - it creates 500 namespaces. Each namespace with 1 build, 1 image, 5 deployments, 5 service, 1 route. Workload churning was enabled, meaning that 10% of the created objects were deleted and recreated at 10 minute intervals for 1h.

Slides https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit?usp=sharing
have detailed description of CPU & Memory consumption of OVN containers and pods.

CPU & Memory usage of OVN pods during node-density-heavy test -
ovnkube-node pod on legacy ovn CPU usage - mean 10% max 60%
ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130%
Ovn-ic pod 700 MiB vs ovkube-node pod 330 MiB

northd container on each node CPU usage - Mean: 40% Max: 110%
northd container on each node memory is only 70 MiB 
ovnkube-network-controller-manager memory 300 MiB 
whereas  ovkube-node container in legacy OVN 330 MiB

CPU & Memory usage during cluster-density test -
ovnkube-node pod on legacy ovn CPU usage - mean 6% max 60%
ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130%
Ovn-ic pod 780 MiB vs ovkube-node pod 360 MiB

northd container on each node CPU usage (ovn-ic env)- Mean: 22% Max: 100%

So northd using 1 core CPU on each node. This needs optimization.

Comment 2 anil venkata 2023-05-09 13:25:22 UTC
Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes with northd single thread. We see northd container using more than 1 core in this testing as well  https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0

Comment 3 Dan Williams 2023-05-09 14:12:21 UTC
(In reply to anil venkata from comment #2)
> Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and
> OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes
> with northd single thread. We see northd container using more than 1 core in
> this testing as well 
> https://docs.google.com/presentation/d/16NMDoF52gUb-
> MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0

Yeah, (for now) we'd still expect it to use about a core. But with 1 thread does it use *less* than it did before? And did the P99 times change and if so how?

Comment 4 anil venkata 2023-05-16 17:43:11 UTC
We observed similar CPU usage and latency in both northd single and multi thread environments https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_11
So no improvement with northd single thread testing.


Note You need to log in before you can comment on or make changes to this bug.