Bug 1656643

Summary:	Throughput for OVS-DPDK with bond_mode=balance-tcp is severely decreased compared to bond_mode=active-backup
Product:	Red Hat OpenStack	Reporter:	Chris Fields <cfields>
Component:	openvswitch	Assignee:	Eelco Chaudron <echaudro>
Status:	CLOSED NOTABUG	QA Contact:	Yariv <yrachman>
Severity:	medium	Docs Contact:
Priority:	low
Version:	10.0 (Newton)	CC:	apevec, atelang, bperkins, cfields, chrisw, echaudro, fbaudin, hakhande, rhos-maint, sisadoun, tmmorin.orange, tredaelli
Target Milestone:	---	Keywords:	GSS-NFV-Escalation, Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2019-01-03 22:10:53 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Chris Fields 2018-12-05 22:52:20 UTC

Created attachment 1511907 [details]
no_lacp_bond_active_passive.png

Description of problem: Compare throughput of attachment lacp_ovs_bond_balance-tcp.png to no_lacp_bond_active_passive.png. LACP throughput is much less.  Customer is asking if this is an expected result.  

How reproducible:
Re-producible by customer.  

Steps to Reproduce:
1. Test throughput with bond_mode=active/backup 
2. Test throughput with bond_mode=balance-tcp and lacp=active

Actual results:
See attachments mentioned above.

Expected results:
ovs-dpdk bond not expected to yield lower throughput


Additional info:
RHEL: 7.5
OSP: 10
Openvswitch: 2.9.0-56.el7fdp
Libvirt: 3.9.0-14.el7_5.8
DPDK: 17.11-13.el7
Qemu: 2.10.0-21.el7_5.6
Kernel: 3.10.0-862.3.2.el7

Comment 8 Eelco Chaudron 2018-12-11 14:01:10 UTC

As not all data is in, I made some assumptions on your setup.
For example, you are using the NORMAL rule, and only change the UDP port numbers, NOT the MAC addresses etc.

One thing I noticed is that when doing the bond0 lacp test you are sharing one of the cores between the VM port and the dpdk port. This might not be the case in the active-backup scenario.

md thread numa_id 0 core_id 4:
        isolated : false
        port: dpdk01            queue-id:  1    pmd usage: 15 %
        port: vhu478b7f2a-4b    queue-id:  0    pmd usage:  0 %
        port: vhu726dc467-89    queue-id:  0    pmd usage: 81 %
        port: vhu850712a5-d9    queue-id:  3    pmd usage:  0 %

pmd thread numa_id 0 core_id 3:
        isolated : false
        port: dpdk00            queue-id:  1    pmd usage: 15 %
        port: vhu29ffa347-50    queue-id:  0    pmd usage: 81 %
        port: vhua4204662-1b    queue-id:  2    pmd usage:  0 %

So you could try to put the 81% eating interface to another PMD to get a little more throughput.


However, packets being processed in the active-backup scenario run through the data path once.
If you would run the "ovs-appctl dpctl/dump-flows" you can see what is happening (only dumping a single core):

flow-dump from pmd on cpu core: 1
recirc_id(0),in_port(2),packet_type(ns=0,id=0),eth(src=00:00:01:00:00:00,dst=00:00:02:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:112123088, bytes:6727385280, used:0.000s, actions:4
recirc_id(0),in_port(4),packet_type(ns=0,id=0),eth(src=00:00:02:00:00:00,dst=00:00:01:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:112122637, bytes:6727358220, used:0.000s, actions:2

Now if you do the same with tcp balance:

$ ovs-appctl dpctl/dump-flows 
flow-dump from pmd on cpu core: 1
recirc_id(0x54),dp_hash(0xc3642aa6/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1497618, bytes:89857080, used:1.126s, actions:3
recirc_id(0x54),dp_hash(0x15747be3/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1514256, bytes:90855360, used:1.126s, actions:3
recirc_id(0x54),dp_hash(0x8258e772/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1706514, bytes:102390840, used:1.126s, actions:3
recirc_id(0x54),dp_hash(0x8b08e09a/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:4308054, bytes:258483240, used:1.126s, actions:3
recirc_id(0x54),dp_hash(0x34882cff/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:3526590, bytes:211595400, used:1.126s, actions:3
recirc_id(0x54),dp_hash(0x1a4cf616/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1491519, bytes:89491140, used:1.126s, actions:3
recirc_id(0x54),dp_hash(0x2d878153/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:4309990, bytes:258599400, used:1.126s, actions:3
recirc_id(0x54),dp_hash(0xdba4c312/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:3496970, bytes:209818200, used:1.126s, actions:3
recirc_id(0),in_port(4),packet_type(ns=0,id=0),eth(src=00:00:02:00:00:00,dst=00:00:01:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:27875040, bytes:1672502400, used:1.126s, actions:hash(hash_l4(0)),recirc(0x54)
recirc_id(0x54),dp_hash(0x8557662c/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:1716680, bytes:103000800, used:1.126s, actions:3
recirc_id(0),in_port(3),packet_type(ns=0,id=0),eth(src=00:00:01:00:00:00,dst=00:00:02:00:00:00),eth_type(0x0800),ipv4(frag=no), packets:55749604, bytes:3344976240, used:1.126s, actions:4
recirc_id(0x54),dp_hash(0x97ab3635/0xff),in_port(4),packet_type(ns=0,id=0),eth_type(0x0800),ipv4(frag=no), packets:4306840, bytes:258410400, used:1.126s, actions:3


As you can see multiple flows get generated, and the packet is recirculated after doing a hash, which takes CPU power also.
This will result in a lower throughput.

Hope this answers your question why there is a decrease in throughput.

Comment 9 Chris Fields 2018-12-12 16:44:26 UTC

Feedback from customer:

In my test, only the source UDP port was randomized and not the MAC address. The MAC addresses do not change.

Indeed, in the results attached initially to this case affinity was not configured. However, the lack of affinity is not the one causing the significant drop in throughput. For example, I made these experiments this weeks:

In active/passive mode, the throughput for 65536 flows, UDP, 64 Bytes packet size is: 3.8Mpps
In LACP mode, balance-tcp, 1 rxq on the physical interfaces, affinity set, the throughput for 65536 flows, UDP, 64 Bytes packet size is: 1.9Mpps
In LACP mode, balance-tcp, 2 rxq on the physical interfaces, affinity set, the throughput for 65536 flows, UDP, 64 Bytes packet size is: 1.9Mpps
In LACP mode, balance-tcp, 2 rxq on the physical interfaces, NO affinity set, the throughput for 65536 flows, UDP, 64 Bytes packet size is: 1.6Mpps

So, you are right that setting affinity will increase performance but nowhere near to what we have with active/passive. In all LACP scenarios mentioned above, the PMD usage for the physical interface will not go higher than: 30-40% .

The recirculation of packages in LACP mode can be the reason for the significant drop in performance.

Comment 13 Franck Baudin 2018-12-14 13:37:26 UTC

(In reply to Chris Fields from comment #9)
> Feedback from customer:
> 
> In my test, only the source UDP port was randomized and not the MAC address.
> The MAC addresses do not change.

As OVS is labeled as a "virtual switch", it is natural to think that it only work at L2. Bit it is not, and as soon as you enable L3 and above features, performances are dropping as there is no magic: each and every lookup count. For instance, having multiple IP addresses with very few MAC adresses are not expecting to affect performances but in reality, it put pressure on the EMC and performances are affected.

Bottom line: OVS is not a "virtual switch", it is more a "virtual switch and router relying on OpenFlow". This is why OVS performances depends on the enabled features, more insights in https://www.youtube.com/watch?v=YzD91dgyBgo&index=7&list=PLaJlRa-xItwD7ikTsrZOhju5xbE-QP9U1 and https://www.slideshare.net/LF_OpenvSwitch/lfovs17ovsdpdk-for-nfv-go-live-feedback