Description of problem: When running node-density on a 120 node cluster, we see some spikes in pod ready latency times. These spikes correspond to a southbound DB compaction. During this compaction time the ovn-controller is not able to connect to a leader. Version-Release number of selected component (if applicable): 4.11.0-0.nightly-2022-03-20-160505 How reproducible: Always Steps to Reproduce: 1. Run node-density-light on a 120 node vluster Actual results: Pod ready latency spikes which cause the P99 to go up Expected results: A steady pod ready latency during the test
This should be fixed, or made much better, by the parallel compaction in OVS 3.0. Unfortunately, OVS 3.0 is only built for RHEL9 so can't be included in OpenShift 4.12. If we wanted it built for RHEL8, we'd have to drop one of the OVS streams we currently use in OpenShift to free up QE capacity.
Tracking https://issues.redhat.com/browse/ART-5075 where RHEL9 is pulled in for OCP 4.13. Once this is done we can bump to OVS 3.0
openssl 3 performance regression in RHEL9 we should keeop an eye on. Might affect OVN controll plane connectivity: https://bugzilla.redhat.com/show_bug.cgi?id=2168224
OpenShift has moved to Jira for its defect tracking! This bug can now be found in the OCPBUGS project in Jira. https://issues.redhat.com/browse/OCPBUGS-9185