Bug 2108720 - 4.11 ovn etcd latencies exceed SLOs under large cluster-density tests
Summary: 4.11 ovn etcd latencies exceed SLOs under large cluster-density tests
Keywords:
Status: CLOSED DUPLICATE of bug 2108679
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Networking
Version: 4.11
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Jaime Caamaño Ruiz
QA Contact: Anurag saxena
URL:
Whiteboard: perfscale-ovn
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2022-07-19 19:10 UTC by Andrew Collins
Modified: 2022-09-01 15:49 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2022-09-01 15:49:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Andrew Collins 2022-07-19 19:10:33 UTC
Description of problem: 
On an OpenShift 4.11.0-rc.1 configured with OVN-Kubernetes, we observed regression in the etcd disk latencies from 4.10 to 4.11.

When comparing to Openshift SDN, etcd latencies p99 is more than double that of an OpenShift SDN cluster installed with the same version and scale.

Version-Release number of selected component (if applicable):
4.11.0-rc.1

How reproducible:
100%


Steps to Reproduce:
1. Install a 4.11.0-rc.1 cluster with r5.4xlarge masters
2. Scale cluster to 252 nodes
3. Apply cluster-density e2e-benchmarking workload with 4000 iterations 

Actual results:
etcd disk fsync and disk commit latencies have p99s of 22.5ms and 35.9ms, respectively.

Expected results:
etcd disk fsync and disk commit latencies stay within SLOs of 10ms and 20ms, respectively.


Note You need to log in before you can comment on or make changes to this bug.