Description of problem: While running perf workloads on OCP 4.14 environment with OVN-IC patches, northd on each node is consuming high CPU. This environment is having 24 nodes. I ran below tests on this env 1. node-density-heavy - it creates 245 pods per node on a single namespace. 2. cluster-density - it creates 500 namespaces. Each namespace with 1 build, 1 image, 5 deployments, 5 service, 1 route. Workload churning was enabled, meaning that 10% of the created objects were deleted and recreated at 10 minute intervals for 1h. Slides https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit?usp=sharing have detailed description of CPU & Memory consumption of OVN containers and pods. CPU & Memory usage of OVN pods during node-density-heavy test - ovnkube-node pod on legacy ovn CPU usage - mean 10% max 60% ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130% Ovn-ic pod 700 MiB vs ovkube-node pod 330 MiB northd container on each node CPU usage - Mean: 40% Max: 110% northd container on each node memory is only 70 MiB ovnkube-network-controller-manager memory 300 MiB whereas ovkube-node container in legacy OVN 330 MiB CPU & Memory usage during cluster-density test - ovnkube-node pod on legacy ovn CPU usage - mean 6% max 60% ovnkube-local pod on ovn-ic CPU usage - mean 30% max 130% Ovn-ic pod 780 MiB vs ovkube-node pod 360 MiB northd container on each node CPU usage (ovn-ic env)- Mean: 22% Max: 100% So northd using 1 core CPU on each node. This needs optimization.
Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes with northd single thread. We see northd container using more than 1 core in this testing as well https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0
(In reply to anil venkata from comment #2) > Tested using CNO image quay.io/itssurya/dev-images:ic-cno-hack-1thread and > OVN image quay.io/itssurya/dev-images:ic-scale-v0 for testing ovn ic changes > with northd single thread. We see northd container using more than 1 core in > this testing as well > https://docs.google.com/presentation/d/16NMDoF52gUb- > MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_0 Yeah, (for now) we'd still expect it to use about a core. But with 1 thread does it use *less* than it did before? And did the P99 times change and if so how?
We observed similar CPU usage and latency in both northd single and multi thread environments https://docs.google.com/presentation/d/16NMDoF52gUb-MybbAtSs16_888nLbshQYCU1Zw3WuvE/edit#slide=id.g2415c785c7f_0_11 So no improvement with northd single thread testing.