1313040 – significant throughput drop between 1% and 0% loss RFC2544 tests for vhostuser

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 1313040 - significant throughput drop between 1% and 0% loss RFC2544 tests for vhostuser

Summary: significant throughput drop between 1% and 0% loss RFC2544 tests for vhostuser

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	openvswitch-dpdk
Sub Component:
Version:	7.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	rc
Target Release:	7.3
Assignee:	Flavio Leitner
QA Contact:	Red Hat Kernel QE team
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1349523
TreeView+	depends on / blocked

Reported:	2016-02-29 19:28 UTC by Andrew Theurer
Modified:	2016-09-29 14:53 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2016-09-29 14:53:15 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Andrew Theurer 2016-02-29 19:28:04 UTC

Description of problem:
ovs-dpdk with vhostuser can achieve very high throughput, but some packet loss is always present.  If we require no packet loss at all, the maximum throughput drops by 60%.  For example, 1% loss throughput with single-queue vhostuser is 4.21 Mpps with 64-byte frame size, unidirectional traffic.  With 0% loss, is is 1.64 Mpps

Version-Release number of selected component (if applicable):
openvswitch-dpdk 2.4 or 2.5

How reproducible:
Easily, with right equipment

Steps to Reproduce:
1. create a VM with 2 virtio-net devices which are handled by vhostuser/ovs-dpdk
2. create two ovs bridges with 1 dpdk (10Gb) and 1 dpdkvhostuser (virtio) ports each
3. run testpmd in the VM, forwarding between the two virtio/chostuser interfaces
4.  with a packet generator with 2 x 10Gb ports, send packets with 1st 10Gb port to 10Gb interface or first ovs-bridge, and receive traffic on 2nd 10Gb port from 2nd ovs-bridge

Actual results:
1% loss: 4.21 Mpps
0% loss: 1.64 Mpps


Expected results:

0% loss slightly lower than 0% loss throughput


Additional info:

Comment 2 Andrew Theurer 2016-05-31 00:17:42 UTC

We believe we have resolved this with tuning, namely:

isolcpus, nohz_full, and rcu_nocbs in the host for all ovs-dpdk PMD threads and VM vcpu threads
isolcpus, nohz_full, and rcu_nocbs in the guest for all DPDK application PMD threads
sched:fifo-95 in the host for ovs-dpdk PMD threads and VM vcpu threads
sched:fifo-95 in the guest for DPDK application PMD threads

However, using sched:fifo can cause a problem, described here: https://bugzilla.redhat.com/show_bug.cgi?id=1328890

The proposed solution to that problem is only for RT kernel, so the RT kernel will likely be a requirement for high throughput packet processing with zero packet loss.

Comment 3 Flavio Leitner 2016-08-22 17:44:13 UTC

(In reply to Andrew Theurer from comment #2)
> We believe we have resolved this with tuning, namely:

Thanks Andrew.  I found the same results even for single queue case.
Is there anything you consider worth fixing in OVS since those were actually events not under OVS control?

Thanks,
fbl

Comment 4 Andrew Theurer 2016-08-26 15:58:04 UTC

Flavio, at this time I don't think there is anything to address in OVS since the mutex fix for RCU went in.  We, however, have not exercised all possible paths in OVS (for example these tests do not use MAC learning mode)  Eventually we may uncover something.  For example a miss of EMC or something else might cause extra latency from OVS.  Eventually we should document when OVS may not sustain zero packet loss.

Going forward, in order to help understand if OVS is affecting packet loss, I would propose we run two types of tests: 

First is a non-OVS test with RT kernel, in order to confirm kernel and KVM is "good".  This would involve using testpmd in both the guest and the host (with vhostuser used on host testpmd).  We are confident testpmd should not do anything to cause latency spike -this allows us to conclude any latency spike in this test is from either kernel, kvm, or qemu.

Second is a test using OVS instead of testpmd in the host.  This is run once the first test passes.  If this test fails, then we further investigate OVS.

Comment 5 Flavio Leitner 2016-09-29 14:53:15 UTC

Alright, based on comment#2 and comment#4 I am closing this one.  We can open specific bug for the issues we uncover in the future.
Thanks!
fbl

Note You need to log in before you can comment on or make changes to this bug.