Bug 1411455

Summary: Performance decrease moving to OVS 2.6 and DPDK 16.11
Product: Red Hat Enterprise Linux 7 Reporter: Christian Trautman <ctrautma>
Component: openvswitchAssignee: Kevin Traynor <ktraynor>
Status: CLOSED WORKSFORME QA Contact: Network QE <network-qe>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.3CC: atragler, bmichalo, ctrautma, fleitner, osabart, ovs-qe
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-01-18 02:34:02 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
cmdline steps
none
VSPerf 2.5 ovs output
none
VSPerf 2.6 ovs output none

Description Christian Trautman 2017-01-09 18:51:25 UTC
Created attachment 1238855 [details]
cmdline steps

Description of problem: Fast datapath beta channel latest version of OVS 2.6.1 and DPDK 16.11 has decreased performance from OVS 2.5-git 22 and DPDK 16.07. The following decreases in performances are noticed with a single guest configuration.

pvp_tput       ovs 2.5 git22  ovs2.6.1 git 1611  increase/decrease rate   CI baseline value
1queue 2pmd:    5.1mpps          4.7mpps           decrease 7%                4 mpps
1queue 4pmd:    9.4mpps          7.8mpps           decrease 17%               7.8 mpps
2queue 4pmd:    10mpps           9.5mpps           decrease 5%                8 mpps
2queue 8pmd:    18mpps          14.8mpps           decrease 17%               14 mpps
4queue 8pmd:    19.5mpps        17.5mpps           decrease 10%               16 mpps


Version-Release number of selected component (if applicable):
kernel- 3.10.0-514.2.2.el7.x86_64
dpdk-16.11-2.el7fdb.x86_64.rpm
dpdk-tools-16.11-2.el7fdb.x86_64.rpm   
openvswitch-2.6.1-3.git20161206.el7fdb.x86_64.rpm   

How reproducible: Always reproducible


Steps to Reproduce:
Setup OVS netdev bridge with 2 DPDK and 2 VHOSTUSER ports. Intel 520 10 gig Nics are bound to VFIO-PCI driver.
Setup 7.3 guest using DPDK 16.11 testpmd loopback with VFIO-PCI with no IOMMU. 
Attachment will include all cmdline steps for testcase.

Generate 64 byte traffic with 1024k flows. 

Actual results:
Performance has decreased from previous version.


Expected results:
Performance should match or exceed previous version.

Additional info:
Testcases use OPNFV project VSPerf to run testcases. I have also manually verified this outside of VSPerf.

Comment 1 Christian Trautman 2017-01-09 18:54:26 UTC
Created attachment 1238856 [details]
VSPerf 2.5 ovs output

Comment 2 Christian Trautman 2017-01-09 18:54:50 UTC
Created attachment 1238857 [details]
VSPerf 2.6 ovs output

Comment 3 Christian Trautman 2017-01-09 21:11:46 UTC
Perf team reports that they are seeing increased performance to this newer version. We are going to review our tests with them tomorrow and update the bug accordingly.

Comment 11 Christian Trautman 2017-01-18 01:33:16 UTC
Finally was able to do some raw performance testing today to verify that we are seeing better numbers running with fdb 2.6.1 with dpdk 16.11 versus 2.5.22 with 16.07. I think this could be because we need to apply the workaround with resetting the PMD mask to correctly align the PMD to cores correctly. 

Steps to reproduce.

This is with Hyper threading enabled. Network-QE does not typically do non-hyperthreading testing at this time.

Setup ovs bridge with 2 dpdk and 2 vhostuser ports.

[root@netqe22 ~]# ovs-vsctl show
3d7660ab-9e4e-44dc-960a-d6d80552e1a1
    Bridge "ovsbr0"
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
        Port "vhost1"
            Interface "vhost1"
                type: dpdkvhostuser
        Port "vhost0"
            Interface "vhost0"
                type: dpdkvhostuser
    ovs_version: "2.5.0"

Setup using 4 PMD threads on the numa associated with the Nic in test.

Using libvirt launch guest with 3 VCpus.

[root@netqe22 ~]# virsh dumpxml guest30032
<domain type='kvm' id='4'>
  <name>guest30032</name>
  <uuid>37425e76-af6a-44a6-aba0-73434afe34c0</uuid>
  <memory unit='KiB'>4194304</memory>
  <currentMemory unit='KiB'>4194304</currentMemory>
  <memoryBacking>
    <hugepages>
      <page size='1048576' unit='KiB' nodeset='0'/>
    </hugepages>
    <locked/>
  </memoryBacking>
  <vcpu placement='static'>3</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='4'/>
    <vcpupin vcpu='2' cpuset='6'/>
    <emulatorpin cpuset='8'/>
  </cputune>
  <numatune>
    <memory mode='strict' nodeset='0'/>
  </numatune>
  <resource>
    <partition>/machine</partition>
  </resource>
  <os>
    <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type>
    <boot dev='hd'/>
  </os>
  <features>
    <acpi/>
    <apic/>
  </features>
  <cpu mode='custom' match='exact'>
    <model fallback='allow'>Haswell-noTSX</model>
    <numa>
      <cell id='0' cpus='0' memory='4194304' unit='KiB' memAccess='shared'/>
    </numa>
  </cpu>
  <clock offset='utc'>
    <timer name='rtc' tickpolicy='catchup'/>
    <timer name='pit' tickpolicy='delay'/>
    <timer name='hpet' present='no'/>
  </clock>
  <on_poweroff>destroy</on_poweroff>
  <on_reboot>restart</on_reboot>
  <on_crash>restart</on_crash>
  <pm>
    <suspend-to-mem enabled='no'/>
    <suspend-to-disk enabled='no'/>
  </pm>
  <devices>
    <emulator>/usr/libexec/qemu-kvm</emulator>
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/root/rhel7.3-1Q.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
    <controller type='usb' index='0' model='ich9-ehci1'>
      <alias name='usb'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci1'>
      <alias name='usb'/>
      <master startport='0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci2'>
      <alias name='usb'/>
      <master startport='2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/>
    </controller>
    <controller type='usb' index='0' model='ich9-uhci3'>
      <alias name='usb'/>
      <master startport='4'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/>
    </controller>
    <controller type='pci' index='0' model='pci-root'>
      <alias name='pci.0'/>
    </controller>
    <controller type='virtio-serial' index='0'>
      <alias name='virtio-serial0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </controller>
    <interface type='vhostuser'>
      <mac address='52:54:00:11:8f:e7'/>
      <source type='unix' path='/var/run/openvswitch/vhost0' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <alias name='net0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='52:54:00:27:05:6a'/>
      <source type='unix' path='/var/run/openvswitch/vhost1' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <alias name='net1'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='52:54:00:bb:63:7b'/>
      <source bridge='virbr0'/>
      <target dev='vnet0'/>
      <model type='virtio'/>
      <alias name='net2'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
    </interface>
    <serial type='pty'>
      <source path='/dev/pts/2'/>
      <target port='0'/>
      <alias name='serial0'/>
    </serial>
    <console type='pty' tty='/dev/pts/2'>
      <source path='/dev/pts/2'/>
      <target type='serial' port='0'/>
      <alias name='serial0'/>
    </console>
    <channel type='unix'>
      <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-guest30032/org.qemu.guest_agent.0'/>
      <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/>
      <alias name='channel0'/>
      <address type='virtio-serial' controller='0' bus='0' port='1'/>
    </channel>
    <input type='tablet' bus='usb'>
      <alias name='input0'/>
      <address type='usb' bus='0' port='1'/>
    </input>
    <input type='mouse' bus='ps2'>
      <alias name='input1'/>
    </input>
    <input type='keyboard' bus='ps2'>
      <alias name='input2'/>
    </input>
    <graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0'>
      <listen type='address' address='0.0.0.0'/>
    </graphics>
    <video>
      <model type='cirrus' vram='16384' heads='1' primary='yes'/>
      <alias name='video0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </video>
    <memballoon model='virtio'>
      <alias name='balloon0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
    </memballoon>
  </devices>
  <seclabel type='dynamic' model='selinux' relabel='yes'>
    <label>system_u:system_r:svirt_t:s0:c382,c749</label>
    <imagelabel>system_u:object_r:svirt_image_t:s0:c382,c749</imagelabel>
  </seclabel>
  <seclabel type='dynamic' model='dac' relabel='yes'>
    <label>+107:+107</label>
    <imagelabel>+107:+107</imagelabel>
  </seclabel>
</domain>

setup testpmd inside the guest with vfio no_iommu mode and start testpmd with 2 nb-cores and 1 txq and rxq

Set fwd mode to io.

Start traffic using a traffic generator. In my case I'm using our Xena one shot script to send 1024 flows in a bi directional pattern.

With FDB 2.5.22 I get
Average port 1: 3491994.00 pps
Average port 0: 3387008.00 pps

ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 18:
	port: vhost0	queue-id: 0
pmd thread numa_id 0 core_id 22:
	port: dpdk0	queue-id: 0
pmd thread numa_id 0 core_id 42:
	port: dpdk1	queue-id: 0
pmd thread numa_id 0 core_id 46:
	port: vhost1	queue-id: 0
[root@netqe22 ~]# !843
ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 18:
	emc hits:123678783
	megaflow hits:0
	miss:1
	lost:0
	polling cycles:165708587754 (78.56%)
	processing cycles:45226500151 (21.44%)
	avg cycles per packet: 1705.51 (210935087905/123678784)
	avg processing cycles per packet: 365.68 (45226500151/123678784)
pmd thread numa_id 0 core_id 22:
	emc hits:119902268
	megaflow hits:0
	miss:1
	lost:0
	polling cycles:141675060695 (66.07%)
	processing cycles:72768887072 (33.93%)
	avg cycles per packet: 1788.49 (214443947767/119902269)
	avg processing cycles per packet: 606.90 (72768887072/119902269)
pmd thread numa_id 0 core_id 42:
	emc hits:123678783
	megaflow hits:0
	miss:1
	lost:0
	polling cycles:140211256645 (65.85%)
	processing cycles:72705910626 (34.15%)
	avg cycles per packet: 1721.53 (212917167271/123678784)
	avg processing cycles per packet: 587.86 (72705910626/123678784)
main thread:
	emc hits:0
	megaflow hits:0
	miss:0
	lost:0
	polling cycles:2450109 (100.00%)
	processing cycles:0 (0.00%)
pmd thread numa_id 0 core_id 46:
	emc hits:119902268
	megaflow hits:0
	miss:1
	lost:0
	polling cycles:165942791861 (79.03%)
	processing cycles:44042572998 (20.97%)
	avg cycles per packet: 1751.30 (209985364859/119902269)
	avg processing cycles per packet: 367.32 (44042572998/119902269)



With FDB 2.6.1 I am seeing
Average port 1: 4289999.00 pps
Average port 0: 4290038.00 pps

ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 42:
	isolated : false
	port: vhost0	queue-id: 0
pmd thread numa_id 0 core_id 46:
	isolated : false
	port: vhost1	queue-id: 0
pmd thread numa_id 0 core_id 18:
	isolated : false
	port: dpdk0	queue-id: 0
pmd thread numa_id 0 core_id 22:
	isolated : false
	port: dpdk1	queue-id: 0

pmd thread numa_id 0 core_id 42:
	emc hits:151936627
	megaflow hits:0
	avg. subtable lookups per hit:0.00
	miss:1
	lost:0
	polling cycles:82233917980 (63.42%)
	processing cycles:47423596049 (36.58%)
	avg cycles per packet: 853.37 (129657514029/151936628)
	avg processing cycles per packet: 312.13 (47423596049/151936628)
pmd thread numa_id 0 core_id 46:
	emc hits:151865522
	megaflow hits:0
	avg. subtable lookups per hit:0.00
	miss:1
	lost:0
	polling cycles:84114769192 (63.94%)
	processing cycles:47442527038 (36.06%)
	avg cycles per packet: 866.27 (131557296230/151865523)
	avg processing cycles per packet: 312.40 (47442527038/151865523)
main thread:
	emc hits:0
	megaflow hits:0
	avg. subtable lookups per hit:0.00
	miss:0
	lost:0
	polling cycles:1375451 (100.00%)
	processing cycles:0 (0.00%)
pmd thread numa_id 0 core_id 18:
	emc hits:179958942
	megaflow hits:1
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:52015842606 (42.64%)
	processing cycles:69958979264 (57.36%)
	avg cycles per packet: 677.79 (121974821870/179958945)
	avg processing cycles per packet: 388.75 (69958979264/179958945)
pmd thread numa_id 0 core_id 22:
	emc hits:180375261
	megaflow hits:2
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:51724534877 (42.43%)
	processing cycles:70172284436 (57.57%)
	avg cycles per packet: 675.80 (121896819313/180375266)
	avg processing cycles per packet: 389.03 (70172284436/180375266)

Will post 2 queue 8 PMD comparison shortly.

Comment 12 Christian Trautman 2017-01-18 02:34:02 UTC
2Q 8 PMD also shows performance increase moving from 2.5.22 to 2.6.1 when using the PMD reapply work around.

2.5.22 / dpdk 1607
pmd thread numa_id 0 core_id 18:
	port: vhost0	queue-id: 1
pmd thread numa_id 0 core_id 40:
	port: vhost1	queue-id: 0
pmd thread numa_id 0 core_id 20:
	port: dpdk0	queue-id: 0
pmd thread numa_id 0 core_id 22:
	port: dpdk0	queue-id: 1
pmd thread numa_id 0 core_id 44:
	port: dpdk1	queue-id: 0
pmd thread numa_id 0 core_id 46:
	port: dpdk1	queue-id: 1
pmd thread numa_id 0 core_id 16:
	port: vhost0	queue-id: 0
pmd thread numa_id 0 core_id 42:
	port: vhost1	queue-id: 1

ovs-appctl dpif-netdev/pmd-stats-show
pmd thread numa_id 0 core_id 18:
	emc hits:128326191
	megaflow hits:31
	miss:2
	lost:1
	polling cycles:95856494532 (66.44%)
	processing cycles:48426531113 (33.56%)
	avg cycles per packet: 1124.35 (144283025645/128326224)
	avg processing cycles per packet: 377.37 (48426531113/128326224)
pmd thread numa_id 0 core_id 40:
	emc hits:128264303
	megaflow hits:25
	miss:1
	lost:0
	polling cycles:95443686834 (66.10%)
	processing cycles:48960085811 (33.90%)
	avg cycles per packet: 1125.83 (144403772645/128264329)
	avg processing cycles per packet: 381.71 (48960085811/128264329)
pmd thread numa_id 0 core_id 20:
	emc hits:128264307
	megaflow hits:21
	miss:3
	lost:2
	polling cycles:71962243482 (49.86%)
	processing cycles:72377440665 (50.14%)
	avg cycles per packet: 1125.33 (144339684147/128264331)
	avg processing cycles per packet: 564.28 (72377440665/128264331)
main thread:
	emc hits:0
	megaflow hits:0
	miss:0
	lost:0
	polling cycles:1819681 (100.00%)
	processing cycles:0 (0.00%)
pmd thread numa_id 0 core_id 22:
	emc hits:128259051
	megaflow hits:30
	miss:1
	lost:0
	polling cycles:72047918133 (49.92%)
	processing cycles:72283270252 (50.08%)
	avg cycles per packet: 1125.31 (144331188385/128259082)
	avg processing cycles per packet: 563.57 (72283270252/128259082)
pmd thread numa_id 0 core_id 44:
	emc hits:128322446
	megaflow hits:27
	miss:2
	lost:1
	polling cycles:72227684140 (49.90%)
	processing cycles:72530676958 (50.10%)
	avg cycles per packet: 1128.08 (144758361098/128322475)
	avg processing cycles per packet: 565.22 (72530676958/128322475)
pmd thread numa_id 0 core_id 46:
	emc hits:128326193
	megaflow hits:30
	miss:1
	lost:0
	polling cycles:72264182667 (49.90%)
	processing cycles:72557157006 (50.10%)
	avg cycles per packet: 1128.54 (144821339673/128326224)
	avg processing cycles per packet: 565.41 (72557157006/128326224)
pmd thread numa_id 0 core_id 16:
	emc hits:128322456
	megaflow hits:17
	miss:1
	lost:0
	polling cycles:95678418512 (66.31%)
	processing cycles:48605971579 (33.69%)
	avg cycles per packet: 1124.39 (144284390091/128322474)
	avg processing cycles per packet: 378.78 (48605971579/128322474)
pmd thread numa_id 0 core_id 42:
	emc hits:128259061
	megaflow hits:20
	miss:1
	lost:0
	polling cycles:95592345496 (66.19%)
	processing cycles:48821384794 (33.81%)
	avg cycles per packet: 1125.95 (144413730290/128259082)
	avg processing cycles per packet: 380.65 (48821384794/128259082)

Average port 1: 7245579.00 pps
Average port 0: 7244854.00 pps

2.6.1 / dpdk 1611
ovs-appctl dpif-netdev/pmd-rxq-show
pmd thread numa_id 0 core_id 42:
	isolated : false
	port: vhost0	queue-id: 0
pmd thread numa_id 0 core_id 22:
	isolated : false
	port: vhost0	queue-id: 1
pmd thread numa_id 0 core_id 44:
	isolated : false
	port: vhost1	queue-id: 0
pmd thread numa_id 0 core_id 16:
	isolated : false
	port: vhost1	queue-id: 1
pmd thread numa_id 0 core_id 46:
	isolated : false
	port: dpdk1	queue-id: 0
pmd thread numa_id 0 core_id 18:
	isolated : false
	port: dpdk1	queue-id: 1
pmd thread numa_id 0 core_id 20:
	isolated : false
	port: dpdk0	queue-id: 0
pmd thread numa_id 0 core_id 40:
	isolated : false
	port: dpdk0	queue-id: 1

pmd thread numa_id 0 core_id 42:
	emc hits:149987467
	megaflow hits:30
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:82146733208 (63.21%)
	processing cycles:47819711894 (36.79%)
	avg cycles per packet: 866.52 (129966445102/149987528)
	avg processing cycles per packet: 318.82 (47819711894/149987528)
pmd thread numa_id 0 core_id 22:
	emc hits:150291968
	megaflow hits:29
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:81980941281 (63.17%)
	processing cycles:47792237536 (36.83%)
	avg cycles per packet: 863.47 (129773178817/150292027)
	avg processing cycles per packet: 318.00 (47792237536/150292027)
pmd thread numa_id 0 core_id 44:
	emc hits:146240779
	megaflow hits:30
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:82910349745 (63.81%)
	processing cycles:47032900538 (36.19%)
	avg cycles per packet: 888.56 (129943250283/146240840)
	avg processing cycles per packet: 321.61 (47032900538/146240840)
pmd thread numa_id 0 core_id 16:
	emc hits:147609988
	megaflow hits:29
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:83351917891 (64.25%)
	processing cycles:46380146878 (35.75%)
	avg cycles per packet: 878.88 (129732064769/147610047)
	avg processing cycles per packet: 314.21 (46380146878/147610047)
pmd thread numa_id 0 core_id 46:
	emc hits:171873792
	megaflow hits:29
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:42356348738 (37.59%)
	processing cycles:70322255135 (62.41%)
	avg cycles per packet: 655.59 (112678603873/171873851)
	avg processing cycles per packet: 409.15 (70322255135/171873851)
main thread:
	emc hits:0
	megaflow hits:0
	avg. subtable lookups per hit:0.00
	miss:0
	lost:0
	polling cycles:1083740 (100.00%)
	processing cycles:0 (0.00%)
pmd thread numa_id 0 core_id 18:
	emc hits:171883339
	megaflow hits:30
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:41280353120 (36.97%)
	processing cycles:70386707637 (63.03%)
	avg cycles per packet: 649.67 (111667060757/171883400)
	avg processing cycles per packet: 409.50 (70386707637/171883400)
pmd thread numa_id 0 core_id 20:
	emc hits:168094517
	megaflow hits:29
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:42303513955 (37.50%)
	processing cycles:70494304966 (62.50%)
	avg cycles per packet: 671.04 (112797818921/168094576)
	avg processing cycles per packet: 419.37 (70494304966/168094576)
pmd thread numa_id 0 core_id 40:
	emc hits:168090357
	megaflow hits:30
	avg. subtable lookups per hit:1.00
	miss:1
	lost:0
	polling cycles:42078529397 (37.34%)
	processing cycles:70616453569 (62.66%)
	avg cycles per packet: 670.44 (112694982966/168090418)
	avg processing cycles per packet: 420.11 (70616453569/168090418)


Average port 1: 8466621.00 pps
Average port 0: 8275347.00 pps

From this testing I am going to conclude for now that this is related to VSPerf testing which could be a framework issue or something in our scripts. I will need to isolate the issue if it exists there. I am closing this bug for now and will reopen if needed.