Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 2108720

Summary: 4.11 ovn etcd latencies exceed SLOs under large cluster-density tests
Product: OpenShift Container Platform Reporter: Andrew Collins <ancollin>
Component: NetworkingAssignee: Jaime CaamaƱo Ruiz <jcaamano>
Networking sub component: ovn-kubernetes QA Contact: Anurag saxena <anusaxen>
Status: CLOSED DUPLICATE Docs Contact:
Severity: unspecified    
Priority: unspecified CC: dcbw, jhopper, moddi, rravaiol, rsevilla, surya, wking
Version: 4.11   
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: perfscale-ovn
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-09-01 15:49:03 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Collins 2022-07-19 19:10:33 UTC
Description of problem: 
On an OpenShift 4.11.0-rc.1 configured with OVN-Kubernetes, we observed regression in the etcd disk latencies from 4.10 to 4.11.

When comparing to Openshift SDN, etcd latencies p99 is more than double that of an OpenShift SDN cluster installed with the same version and scale.

Version-Release number of selected component (if applicable):
4.11.0-rc.1

How reproducible:
100%


Steps to Reproduce:
1. Install a 4.11.0-rc.1 cluster with r5.4xlarge masters
2. Scale cluster to 252 nodes
3. Apply cluster-density e2e-benchmarking workload with 4000 iterations 

Actual results:
etcd disk fsync and disk commit latencies have p99s of 22.5ms and 35.9ms, respectively.

Expected results:
etcd disk fsync and disk commit latencies stay within SLOs of 10ms and 20ms, respectively.