Bug 2067342

Summary: Pod latency spikes are observed when there is a compaction/leadership transfer
Product: OpenShift Container Platform Reporter: Mohit Sheth <msheth>
Component: NetworkingAssignee: Jaime Caamaño Ruiz <jcaamano>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DEFERRED Docs Contact:
Severity: medium    
Priority: medium CC: dcbw, rsevilla, sdodson, surya
Version: 4.11   
Target Milestone: ---   
Target Release: 4.13.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: perfscale-ovn
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-03-09 01:15:41 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2069108    
Bug Blocks:    

Description Mohit Sheth 2022-03-23 20:14:57 UTC
Description of problem:
When running node-density on a 120 node cluster, we see some spikes in pod 
 ready latency times. These spikes correspond to a southbound DB compaction. During this compaction time the ovn-controller is not able to connect to a leader.

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-20-160505

How reproducible:
Always

Steps to Reproduce:
1. Run node-density-light on a 120 node vluster

Actual results:
Pod ready latency spikes which cause the P99 to go up

Expected results:
A steady pod ready latency during the test

Comment 2 Dan Williams 2022-11-16 17:29:38 UTC
This should be fixed, or made much better, by the parallel compaction in OVS 3.0. Unfortunately, OVS 3.0 is only built for RHEL9 so can't be included in OpenShift 4.12. If we wanted it built for RHEL8, we'd have to drop one of the OVS streams we currently use in OpenShift to free up QE capacity.

Comment 3 Jaime Caamaño Ruiz 2022-11-17 15:24:34 UTC
Tracking https://issues.redhat.com/browse/ART-5075 where RHEL9 is pulled in for OCP 4.13. Once this is done we can bump to OVS 3.0

Comment 4 Jaime Caamaño Ruiz 2023-02-21 14:11:18 UTC
openssl 3 performance regression in RHEL9 we should keeop an eye on. Might affect OVN controll plane connectivity:

https://bugzilla.redhat.com/show_bug.cgi?id=2168224

Comment 6 Shiftzilla 2023-03-09 01:15:41 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9185