Bug 2067342 - Pod latency spikes are observed when there is a compaction/leadership transfer
Summary: Pod latency spikes are observed when there is a compaction/leadership transfer
Keywords:
Status: CLOSED DEFERRED
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 4.13.0
Assignee: Jaime Caamaño Ruiz
QA Contact: Anurag saxena
URL:
Whiteboard: perfscale-ovn
Depends On: 2069108
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-03-23 20:14 UTC by Mohit Sheth
Modified: 2023-03-09 01:15 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-03-09 01:15:41 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Mohit Sheth 2022-03-23 20:14:57 UTC
Description of problem:
When running node-density on a 120 node cluster, we see some spikes in pod 
 ready latency times. These spikes correspond to a southbound DB compaction. During this compaction time the ovn-controller is not able to connect to a leader.

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-20-160505

How reproducible:
Always

Steps to Reproduce:
1. Run node-density-light on a 120 node vluster

Actual results:
Pod ready latency spikes which cause the P99 to go up

Expected results:
A steady pod ready latency during the test

Comment 2 Dan Williams 2022-11-16 17:29:38 UTC
This should be fixed, or made much better, by the parallel compaction in OVS 3.0. Unfortunately, OVS 3.0 is only built for RHEL9 so can't be included in OpenShift 4.12. If we wanted it built for RHEL8, we'd have to drop one of the OVS streams we currently use in OpenShift to free up QE capacity.

Comment 3 Jaime Caamaño Ruiz 2022-11-17 15:24:34 UTC
Tracking https://issues.redhat.com/browse/ART-5075 where RHEL9 is pulled in for OCP 4.13. Once this is done we can bump to OVS 3.0

Comment 4 Jaime Caamaño Ruiz 2023-02-21 14:11:18 UTC
openssl 3 performance regression in RHEL9 we should keeop an eye on. Might affect OVN controll plane connectivity:

https://bugzilla.redhat.com/show_bug.cgi?id=2168224

Comment 6 Shiftzilla 2023-03-09 01:15:41 UTC
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9185


Note You need to log in before you can comment on or make changes to this bug.