1862064 – VMs on two different compute nodes are not able to achieve 100% throughput capacity at physical DPDK ports

Bug 1862064 - VMs on two different compute nodes are not able to achieve 100% throughput capacity at physical DPDK ports

Summary: VMs on two different compute nodes are not able to achieve 100% throughput ca...

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openvswitch
Sub Component:
Version:	13.0 (Queens)
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Kevin Traynor
QA Contact:	Eran Kuris
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	epmosp13bugs 1588541
TreeView+	depends on / blocked

Reported:	2020-07-30 10:28 UTC by Andre
Modified:	2024-03-25 16:14 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-01-13 15:22:27 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andre 2020-07-30 10:28:05 UTC

Description of problem:
Scenario:
- Two virtual machines for load generation and reception on different compute nodes.
- For case when both VMs (sender and receiver) are on same compute node it was able to achieve the 100% throughput capacity for larger packet sizes and the packet drop was almost none.
- The traffic flow is unidirectional i.e. from one VM (generation) towards the second VM (reception).
- Emulator threads policy set to isolated with NICs alignment on the same NUMA node.
- There are separated the IsolCPUs that are used for PMD threads and VMs, so none of PMD thread core will be shared for handling interrupts.
- The CPUs for handling OS processes are being separate than IsolCPUs.
- DPDK socket memory was set for NUMA node 1 to 8192. This tuning gave the best results but still not achieving the 100% throughput capacity and packet loss is still there.
- Bios Settings seems aligned with our recommendation[1]

Customer is observing the consistent packet drop at physical DPDK ports. While the packet drop for different packet sizes is inconsistent. Due to which the overall output is not as expected.
Implementing different tuning parameters, the throughput has considerably improved but not as expected, also inconsistent packet drop for different packet sizes (64, 1024, 2048, 8192) is still observed. The PMD threads are running on the same NUMA node. The vCPUs allocated to the VMs are also from the same NUMA. The following tunings were made:
1. Changing the Rx & Tx size of physical DPDK ports (but no effective improvements)
2. Changing the Rx & Tx queue numbers (but no effective improvements)
3. Emulator threads isolated (throughput increased)
4. NICs alignment changed from different NUMA to same NUMA node(throughput increased with inconsistent packet loss for different packet sizes)

From our end we tried some other approaches:
- The dpdk userspace bridge had link up; It was then modified to 'down'[2]
- It was noticed a 2x improvement in forwarding rate for 1024 and above packet sizes between 1st and 2nd iteration.
- The same performance improvement was not seen for 64bytes packets.
- The bond mode on the compute nodes was changed from balance-tcp to balance-slb

We don't see any saturation of cpu resources for PMD thread.
--------------------------------------------------------------------------------------------------------------------------------------------
Compute 0:
sos_commands/openvswitch/ovs-appctl_dpif-netdev.pmd-rxq-show
pmd thread numa_id 1 core_id 3:
isolated : false
port: vhudcbc8718-7b queue-id: 0 pmd usage: 28 %
Also since this instance is running on NUMA1 , the PMD that is polling its port is also from NUMA1 (core_id 3) and the instance is sending traffic over a port from provider network mapped to ovs-dpdk bridge on NUMA1.
Compute 1:
sos_commands/openvswitch/ovs-appctl_dpif-netdev.pmd-rxq-show
pmd thread numa_id 1 core_id 3:
isolated : false
port: dpdk3 queue-id: 0 pmd usage: 16 %
--------------------------------------------------------------------------------------------------------------------------------------------

[1] https://access.redhat.com/documentation/en-us/red_hat_openstack_platform/13/html-single/network_functions_virtualization_planning_and_configuration_guide/index#review_bios_settings
[2] https://access.redhat.com/solutions/3381011

Version-Release number of selected component (if applicable):
- openvswitch-2.9.0-103.el7fdp.x86_64

How reproducible:
Repeatedly

Steps to Reproduce:
1.
2.
3.

Actual results:

Expected results:
The same results for different compute nodes as it was obtained in case of same compute node.

Additional info:

The logs are on supportshell under /cases/02694622

Note You need to log in before you can comment on or make changes to this bug.