2067342 – Pod latency spikes are observed when there is a compaction/leadership transfer

Bug 2067342 - Pod latency spikes are observed when there is a compaction/leadership transfer

Summary: Pod latency spikes are observed when there is a compaction/leadership transfer

Keywords:
Status:	CLOSED DEFERRED
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Networking
Sub Component:
Version:	4.11
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.13.0
Assignee:	Jaime Caamaño Ruiz
QA Contact:	Anurag saxena
Docs Contact:
URL:
Whiteboard:	perfscale-ovn
Depends On:	2069108
Blocks:
TreeView+	depends on / blocked

Reported:	2022-03-23 20:14 UTC by Mohit Sheth
Modified:	2023-03-09 01:15 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2023-03-09 01:15:41 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Mohit Sheth 2022-03-23 20:14:57 UTC

Description of problem:
When running node-density on a 120 node cluster, we see some spikes in pod 
 ready latency times. These spikes correspond to a southbound DB compaction. During this compaction time the ovn-controller is not able to connect to a leader.

Version-Release number of selected component (if applicable):
4.11.0-0.nightly-2022-03-20-160505

How reproducible:
Always

Steps to Reproduce:
1. Run node-density-light on a 120 node vluster

Actual results:
Pod ready latency spikes which cause the P99 to go up

Expected results:
A steady pod ready latency during the test

Comment 2 Dan Williams 2022-11-16 17:29:38 UTC

This should be fixed, or made much better, by the parallel compaction in OVS 3.0. Unfortunately, OVS 3.0 is only built for RHEL9 so can't be included in OpenShift 4.12. If we wanted it built for RHEL8, we'd have to drop one of the OVS streams we currently use in OpenShift to free up QE capacity.

Comment 3 Jaime Caamaño Ruiz 2022-11-17 15:24:34 UTC

Tracking https://issues.redhat.com/browse/ART-5075 where RHEL9 is pulled in for OCP 4.13. Once this is done we can bump to OVS 3.0

Comment 4 Jaime Caamaño Ruiz 2023-02-21 14:11:18 UTC

openssl 3 performance regression in RHEL9 we should keeop an eye on. Might affect OVN controll plane connectivity:

https://bugzilla.redhat.com/show_bug.cgi?id=2168224

Comment 6 Shiftzilla 2023-03-09 01:15:41 UTC

OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira.

https://issues.redhat.com/browse/OCPBUGS-9185

Note You need to log in before you can comment on or make changes to this bug.