Red Hat Bugzilla – Bug 1313040
significant throughput drop between 1% and 0% loss RFC2544 tests for vhostuser
Last modified: 2016-09-29 10:53:15 EDT
Description of problem:
ovs-dpdk with vhostuser can achieve very high throughput, but some packet loss is always present. If we require no packet loss at all, the maximum throughput drops by 60%. For example, 1% loss throughput with single-queue vhostuser is 4.21 Mpps with 64-byte frame size, unidirectional traffic. With 0% loss, is is 1.64 Mpps
Version-Release number of selected component (if applicable):
openvswitch-dpdk 2.4 or 2.5
Easily, with right equipment
Steps to Reproduce:
1. create a VM with 2 virtio-net devices which are handled by vhostuser/ovs-dpdk
2. create two ovs bridges with 1 dpdk (10Gb) and 1 dpdkvhostuser (virtio) ports each
3. run testpmd in the VM, forwarding between the two virtio/chostuser interfaces
4. with a packet generator with 2 x 10Gb ports, send packets with 1st 10Gb port to 10Gb interface or first ovs-bridge, and receive traffic on 2nd 10Gb port from 2nd ovs-bridge
1% loss: 4.21 Mpps
0% loss: 1.64 Mpps
0% loss slightly lower than 0% loss throughput
We believe we have resolved this with tuning, namely:
isolcpus, nohz_full, and rcu_nocbs in the host for all ovs-dpdk PMD threads and VM vcpu threads
isolcpus, nohz_full, and rcu_nocbs in the guest for all DPDK application PMD threads
sched:fifo-95 in the host for ovs-dpdk PMD threads and VM vcpu threads
sched:fifo-95 in the guest for DPDK application PMD threads
However, using sched:fifo can cause a problem, described here: https://bugzilla.redhat.com/show_bug.cgi?id=1328890
The proposed solution to that problem is only for RT kernel, so the RT kernel will likely be a requirement for high throughput packet processing with zero packet loss.
(In reply to Andrew Theurer from comment #2)
> We believe we have resolved this with tuning, namely:
Thanks Andrew. I found the same results even for single queue case.
Is there anything you consider worth fixing in OVS since those were actually events not under OVS control?
Flavio, at this time I don't think there is anything to address in OVS since the mutex fix for RCU went in. We, however, have not exercised all possible paths in OVS (for example these tests do not use MAC learning mode) Eventually we may uncover something. For example a miss of EMC or something else might cause extra latency from OVS. Eventually we should document when OVS may not sustain zero packet loss.
Going forward, in order to help understand if OVS is affecting packet loss, I would propose we run two types of tests:
First is a non-OVS test with RT kernel, in order to confirm kernel and KVM is "good". This would involve using testpmd in both the guest and the host (with vhostuser used on host testpmd). We are confident testpmd should not do anything to cause latency spike -this allows us to conclude any latency spike in this test is from either kernel, kvm, or qemu.
Second is a test using OVS instead of testpmd in the host. This is run once the first test passes. If this test fails, then we further investigate OVS.
Alright, based on comment#2 and comment#4 I am closing this one. We can open specific bug for the issues we uncover in the future.