Bug 2226874 - [OSP 16.2][OVN]Intermittent UDP latency on OSP site running ovn-2021-21.12.0-116.el8fdp.x86_64
Summary: [OSP 16.2][OVN]Intermittent UDP latency on OSP site running ovn-2021-21.12.0-...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: ovn-2021
Version: RHEL 8.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Ales Musil
QA Contact: Jianlin Shi
URL:
Whiteboard:
Depends On:
Blocks: 2227901
TreeView+ depends on / blocked
 
Reported: 2023-07-26 18:26 UTC by Matt Flusche
Modified: 2023-08-17 14:10 UTC (History)
20 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 2227901 (view as bug list)
Environment:
Last Closed: 2023-08-17 14:10:18 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-3064 0 None None None 2023-07-26 18:30:06 UTC

Description Matt Flusche 2023-07-26 18:26:30 UTC
Description of problem:

This customer is running multiple sites (separate OSP deployments). The existing sites are operating normally and satisfy the 15ms latency requirements.  In this new deployment, there is an intermittent issue where ~1% of traffic is seeing higher latency - upwards of 250ms.

Originally it was suspected the issue was isolated to a single hypervisor; however, it is now reported to occur on multiple hosts and even between VMs on a single host.

It was expected that the sites were deployed with the same OSP versions; however, this new site is running a newer OVN versions.

The existing deployments without issues are running: ovn-2021-21.12.0-103.el8fdp.x86_64

It is believed sites are running the same openvswitch versions (to be confirmed)

The offending traffic is on the same L2 network.  These are VLAN provider networks.

In Progress:

- The consultant working on this deployment is confirming all the delta between these sites. 
- I've recommended that the new site is rolled back to matching version as the existing site to verify issue.
- The customer has a script that tests and reports these latency issues.  I've asked for tcpdumps at the source and destination hypervisors on the VM tap interfaces and external physical interfaces while reproducing the issue. 


This is an escalation to involve Neutron and OVN dev teams in this issue.


Version-Release number of selected component (if applicable):
OSP 16.2.4
ovn-2021-21.12.0-116.el8fdp.x86_64


How reproducible:
This specific deployment.


Steps to Reproduce:
1. As described above

Additional info:
Will add in private comments

Comment 24 Dan Williams 2023-08-01 15:37:00 UTC
I believe he means using the `coverage/show` request to ovn-appctl and ovs-appctl for various components.

ovn-appctl -t <pidfile of ovn-controller> coverage/show
ovs-appctl -t <pidfile of ovs-vswitchd> coverage/show


Note You need to log in before you can comment on or make changes to this bug.