Bug 1411455
Summary: | Performance decrease moving to OVS 2.6 and DPDK 16.11 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Christian Trautman <ctrautma> | ||||||||
Component: | openvswitch | Assignee: | Kevin Traynor <ktraynor> | ||||||||
Status: | CLOSED WORKSFORME | QA Contact: | Network QE <network-qe> | ||||||||
Severity: | high | Docs Contact: | |||||||||
Priority: | unspecified | ||||||||||
Version: | 7.3 | CC: | atragler, bmichalo, ctrautma, fleitner, osabart, ovs-qe | ||||||||
Target Milestone: | rc | ||||||||||
Target Release: | --- | ||||||||||
Hardware: | x86_64 | ||||||||||
OS: | Linux | ||||||||||
Whiteboard: | |||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
Doc Text: | Story Points: | --- | |||||||||
Clone Of: | Environment: | ||||||||||
Last Closed: | 2017-01-18 02:34:02 UTC | Type: | Bug | ||||||||
Regression: | --- | Mount Type: | --- | ||||||||
Documentation: | --- | CRM: | |||||||||
Verified Versions: | Category: | --- | |||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||
Embargoed: | |||||||||||
Attachments: |
|
Created attachment 1238856 [details]
VSPerf 2.5 ovs output
Created attachment 1238857 [details]
VSPerf 2.6 ovs output
Perf team reports that they are seeing increased performance to this newer version. We are going to review our tests with them tomorrow and update the bug accordingly. Finally was able to do some raw performance testing today to verify that we are seeing better numbers running with fdb 2.6.1 with dpdk 16.11 versus 2.5.22 with 16.07. I think this could be because we need to apply the workaround with resetting the PMD mask to correctly align the PMD to cores correctly. Steps to reproduce. This is with Hyper threading enabled. Network-QE does not typically do non-hyperthreading testing at this time. Setup ovs bridge with 2 dpdk and 2 vhostuser ports. [root@netqe22 ~]# ovs-vsctl show 3d7660ab-9e4e-44dc-960a-d6d80552e1a1 Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "dpdk0" Interface "dpdk0" type: dpdk Port "dpdk1" Interface "dpdk1" type: dpdk Port "vhost1" Interface "vhost1" type: dpdkvhostuser Port "vhost0" Interface "vhost0" type: dpdkvhostuser ovs_version: "2.5.0" Setup using 4 PMD threads on the numa associated with the Nic in test. Using libvirt launch guest with 3 VCpus. [root@netqe22 ~]# virsh dumpxml guest30032 <domain type='kvm' id='4'> <name>guest30032</name> <uuid>37425e76-af6a-44a6-aba0-73434afe34c0</uuid> <memory unit='KiB'>4194304</memory> <currentMemory unit='KiB'>4194304</currentMemory> <memoryBacking> <hugepages> <page size='1048576' unit='KiB' nodeset='0'/> </hugepages> <locked/> </memoryBacking> <vcpu placement='static'>3</vcpu> <cputune> <vcpupin vcpu='0' cpuset='2'/> <vcpupin vcpu='1' cpuset='4'/> <vcpupin vcpu='2' cpuset='6'/> <emulatorpin cpuset='8'/> </cputune> <numatune> <memory mode='strict' nodeset='0'/> </numatune> <resource> <partition>/machine</partition> </resource> <os> <type arch='x86_64' machine='pc-i440fx-rhel7.2.0'>hvm</type> <boot dev='hd'/> </os> <features> <acpi/> <apic/> </features> <cpu mode='custom' match='exact'> <model fallback='allow'>Haswell-noTSX</model> <numa> <cell id='0' cpus='0' memory='4194304' unit='KiB' memAccess='shared'/> </numa> </cpu> <clock offset='utc'> <timer name='rtc' tickpolicy='catchup'/> <timer name='pit' tickpolicy='delay'/> <timer name='hpet' present='no'/> </clock> <on_poweroff>destroy</on_poweroff> <on_reboot>restart</on_reboot> <on_crash>restart</on_crash> <pm> <suspend-to-mem enabled='no'/> <suspend-to-disk enabled='no'/> </pm> <devices> <emulator>/usr/libexec/qemu-kvm</emulator> <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/root/rhel7.3-1Q.qcow2'/> <backingStore/> <target dev='vda' bus='virtio'/> <alias name='virtio-disk0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </disk> <controller type='usb' index='0' model='ich9-ehci1'> <alias name='usb'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x7'/> </controller> <controller type='usb' index='0' model='ich9-uhci1'> <alias name='usb'/> <master startport='0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0' multifunction='on'/> </controller> <controller type='usb' index='0' model='ich9-uhci2'> <alias name='usb'/> <master startport='2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x1'/> </controller> <controller type='usb' index='0' model='ich9-uhci3'> <alias name='usb'/> <master startport='4'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x2'/> </controller> <controller type='pci' index='0' model='pci-root'> <alias name='pci.0'/> </controller> <controller type='virtio-serial' index='0'> <alias name='virtio-serial0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/> </controller> <interface type='vhostuser'> <mac address='52:54:00:11:8f:e7'/> <source type='unix' path='/var/run/openvswitch/vhost0' mode='client'/> <model type='virtio'/> <driver name='vhost'/> <alias name='net0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='52:54:00:27:05:6a'/> <source type='unix' path='/var/run/openvswitch/vhost1' mode='client'/> <model type='virtio'/> <driver name='vhost'/> <alias name='net1'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x09' function='0x0'/> </interface> <interface type='bridge'> <mac address='52:54:00:bb:63:7b'/> <source bridge='virbr0'/> <target dev='vnet0'/> <model type='virtio'/> <alias name='net2'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/> </interface> <serial type='pty'> <source path='/dev/pts/2'/> <target port='0'/> <alias name='serial0'/> </serial> <console type='pty' tty='/dev/pts/2'> <source path='/dev/pts/2'/> <target type='serial' port='0'/> <alias name='serial0'/> </console> <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-4-guest30032/org.qemu.guest_agent.0'/> <target type='virtio' name='org.qemu.guest_agent.0' state='disconnected'/> <alias name='channel0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> <input type='tablet' bus='usb'> <alias name='input0'/> <address type='usb' bus='0' port='1'/> </input> <input type='mouse' bus='ps2'> <alias name='input1'/> </input> <input type='keyboard' bus='ps2'> <alias name='input2'/> </input> <graphics type='vnc' port='5900' autoport='yes' listen='0.0.0.0'> <listen type='address' address='0.0.0.0'/> </graphics> <video> <model type='cirrus' vram='16384' heads='1' primary='yes'/> <alias name='video0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </video> <memballoon model='virtio'> <alias name='balloon0'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/> </memballoon> </devices> <seclabel type='dynamic' model='selinux' relabel='yes'> <label>system_u:system_r:svirt_t:s0:c382,c749</label> <imagelabel>system_u:object_r:svirt_image_t:s0:c382,c749</imagelabel> </seclabel> <seclabel type='dynamic' model='dac' relabel='yes'> <label>+107:+107</label> <imagelabel>+107:+107</imagelabel> </seclabel> </domain> setup testpmd inside the guest with vfio no_iommu mode and start testpmd with 2 nb-cores and 1 txq and rxq Set fwd mode to io. Start traffic using a traffic generator. In my case I'm using our Xena one shot script to send 1024 flows in a bi directional pattern. With FDB 2.5.22 I get Average port 1: 3491994.00 pps Average port 0: 3387008.00 pps ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 18: port: vhost0 queue-id: 0 pmd thread numa_id 0 core_id 22: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 42: port: dpdk1 queue-id: 0 pmd thread numa_id 0 core_id 46: port: vhost1 queue-id: 0 [root@netqe22 ~]# !843 ovs-appctl dpif-netdev/pmd-stats-show pmd thread numa_id 0 core_id 18: emc hits:123678783 megaflow hits:0 miss:1 lost:0 polling cycles:165708587754 (78.56%) processing cycles:45226500151 (21.44%) avg cycles per packet: 1705.51 (210935087905/123678784) avg processing cycles per packet: 365.68 (45226500151/123678784) pmd thread numa_id 0 core_id 22: emc hits:119902268 megaflow hits:0 miss:1 lost:0 polling cycles:141675060695 (66.07%) processing cycles:72768887072 (33.93%) avg cycles per packet: 1788.49 (214443947767/119902269) avg processing cycles per packet: 606.90 (72768887072/119902269) pmd thread numa_id 0 core_id 42: emc hits:123678783 megaflow hits:0 miss:1 lost:0 polling cycles:140211256645 (65.85%) processing cycles:72705910626 (34.15%) avg cycles per packet: 1721.53 (212917167271/123678784) avg processing cycles per packet: 587.86 (72705910626/123678784) main thread: emc hits:0 megaflow hits:0 miss:0 lost:0 polling cycles:2450109 (100.00%) processing cycles:0 (0.00%) pmd thread numa_id 0 core_id 46: emc hits:119902268 megaflow hits:0 miss:1 lost:0 polling cycles:165942791861 (79.03%) processing cycles:44042572998 (20.97%) avg cycles per packet: 1751.30 (209985364859/119902269) avg processing cycles per packet: 367.32 (44042572998/119902269) With FDB 2.6.1 I am seeing Average port 1: 4289999.00 pps Average port 0: 4290038.00 pps ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 42: isolated : false port: vhost0 queue-id: 0 pmd thread numa_id 0 core_id 46: isolated : false port: vhost1 queue-id: 0 pmd thread numa_id 0 core_id 18: isolated : false port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 22: isolated : false port: dpdk1 queue-id: 0 pmd thread numa_id 0 core_id 42: emc hits:151936627 megaflow hits:0 avg. subtable lookups per hit:0.00 miss:1 lost:0 polling cycles:82233917980 (63.42%) processing cycles:47423596049 (36.58%) avg cycles per packet: 853.37 (129657514029/151936628) avg processing cycles per packet: 312.13 (47423596049/151936628) pmd thread numa_id 0 core_id 46: emc hits:151865522 megaflow hits:0 avg. subtable lookups per hit:0.00 miss:1 lost:0 polling cycles:84114769192 (63.94%) processing cycles:47442527038 (36.06%) avg cycles per packet: 866.27 (131557296230/151865523) avg processing cycles per packet: 312.40 (47442527038/151865523) main thread: emc hits:0 megaflow hits:0 avg. subtable lookups per hit:0.00 miss:0 lost:0 polling cycles:1375451 (100.00%) processing cycles:0 (0.00%) pmd thread numa_id 0 core_id 18: emc hits:179958942 megaflow hits:1 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:52015842606 (42.64%) processing cycles:69958979264 (57.36%) avg cycles per packet: 677.79 (121974821870/179958945) avg processing cycles per packet: 388.75 (69958979264/179958945) pmd thread numa_id 0 core_id 22: emc hits:180375261 megaflow hits:2 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:51724534877 (42.43%) processing cycles:70172284436 (57.57%) avg cycles per packet: 675.80 (121896819313/180375266) avg processing cycles per packet: 389.03 (70172284436/180375266) Will post 2 queue 8 PMD comparison shortly. 2Q 8 PMD also shows performance increase moving from 2.5.22 to 2.6.1 when using the PMD reapply work around. 2.5.22 / dpdk 1607 pmd thread numa_id 0 core_id 18: port: vhost0 queue-id: 1 pmd thread numa_id 0 core_id 40: port: vhost1 queue-id: 0 pmd thread numa_id 0 core_id 20: port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 22: port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 44: port: dpdk1 queue-id: 0 pmd thread numa_id 0 core_id 46: port: dpdk1 queue-id: 1 pmd thread numa_id 0 core_id 16: port: vhost0 queue-id: 0 pmd thread numa_id 0 core_id 42: port: vhost1 queue-id: 1 ovs-appctl dpif-netdev/pmd-stats-show pmd thread numa_id 0 core_id 18: emc hits:128326191 megaflow hits:31 miss:2 lost:1 polling cycles:95856494532 (66.44%) processing cycles:48426531113 (33.56%) avg cycles per packet: 1124.35 (144283025645/128326224) avg processing cycles per packet: 377.37 (48426531113/128326224) pmd thread numa_id 0 core_id 40: emc hits:128264303 megaflow hits:25 miss:1 lost:0 polling cycles:95443686834 (66.10%) processing cycles:48960085811 (33.90%) avg cycles per packet: 1125.83 (144403772645/128264329) avg processing cycles per packet: 381.71 (48960085811/128264329) pmd thread numa_id 0 core_id 20: emc hits:128264307 megaflow hits:21 miss:3 lost:2 polling cycles:71962243482 (49.86%) processing cycles:72377440665 (50.14%) avg cycles per packet: 1125.33 (144339684147/128264331) avg processing cycles per packet: 564.28 (72377440665/128264331) main thread: emc hits:0 megaflow hits:0 miss:0 lost:0 polling cycles:1819681 (100.00%) processing cycles:0 (0.00%) pmd thread numa_id 0 core_id 22: emc hits:128259051 megaflow hits:30 miss:1 lost:0 polling cycles:72047918133 (49.92%) processing cycles:72283270252 (50.08%) avg cycles per packet: 1125.31 (144331188385/128259082) avg processing cycles per packet: 563.57 (72283270252/128259082) pmd thread numa_id 0 core_id 44: emc hits:128322446 megaflow hits:27 miss:2 lost:1 polling cycles:72227684140 (49.90%) processing cycles:72530676958 (50.10%) avg cycles per packet: 1128.08 (144758361098/128322475) avg processing cycles per packet: 565.22 (72530676958/128322475) pmd thread numa_id 0 core_id 46: emc hits:128326193 megaflow hits:30 miss:1 lost:0 polling cycles:72264182667 (49.90%) processing cycles:72557157006 (50.10%) avg cycles per packet: 1128.54 (144821339673/128326224) avg processing cycles per packet: 565.41 (72557157006/128326224) pmd thread numa_id 0 core_id 16: emc hits:128322456 megaflow hits:17 miss:1 lost:0 polling cycles:95678418512 (66.31%) processing cycles:48605971579 (33.69%) avg cycles per packet: 1124.39 (144284390091/128322474) avg processing cycles per packet: 378.78 (48605971579/128322474) pmd thread numa_id 0 core_id 42: emc hits:128259061 megaflow hits:20 miss:1 lost:0 polling cycles:95592345496 (66.19%) processing cycles:48821384794 (33.81%) avg cycles per packet: 1125.95 (144413730290/128259082) avg processing cycles per packet: 380.65 (48821384794/128259082) Average port 1: 7245579.00 pps Average port 0: 7244854.00 pps 2.6.1 / dpdk 1611 ovs-appctl dpif-netdev/pmd-rxq-show pmd thread numa_id 0 core_id 42: isolated : false port: vhost0 queue-id: 0 pmd thread numa_id 0 core_id 22: isolated : false port: vhost0 queue-id: 1 pmd thread numa_id 0 core_id 44: isolated : false port: vhost1 queue-id: 0 pmd thread numa_id 0 core_id 16: isolated : false port: vhost1 queue-id: 1 pmd thread numa_id 0 core_id 46: isolated : false port: dpdk1 queue-id: 0 pmd thread numa_id 0 core_id 18: isolated : false port: dpdk1 queue-id: 1 pmd thread numa_id 0 core_id 20: isolated : false port: dpdk0 queue-id: 0 pmd thread numa_id 0 core_id 40: isolated : false port: dpdk0 queue-id: 1 pmd thread numa_id 0 core_id 42: emc hits:149987467 megaflow hits:30 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:82146733208 (63.21%) processing cycles:47819711894 (36.79%) avg cycles per packet: 866.52 (129966445102/149987528) avg processing cycles per packet: 318.82 (47819711894/149987528) pmd thread numa_id 0 core_id 22: emc hits:150291968 megaflow hits:29 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:81980941281 (63.17%) processing cycles:47792237536 (36.83%) avg cycles per packet: 863.47 (129773178817/150292027) avg processing cycles per packet: 318.00 (47792237536/150292027) pmd thread numa_id 0 core_id 44: emc hits:146240779 megaflow hits:30 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:82910349745 (63.81%) processing cycles:47032900538 (36.19%) avg cycles per packet: 888.56 (129943250283/146240840) avg processing cycles per packet: 321.61 (47032900538/146240840) pmd thread numa_id 0 core_id 16: emc hits:147609988 megaflow hits:29 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:83351917891 (64.25%) processing cycles:46380146878 (35.75%) avg cycles per packet: 878.88 (129732064769/147610047) avg processing cycles per packet: 314.21 (46380146878/147610047) pmd thread numa_id 0 core_id 46: emc hits:171873792 megaflow hits:29 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:42356348738 (37.59%) processing cycles:70322255135 (62.41%) avg cycles per packet: 655.59 (112678603873/171873851) avg processing cycles per packet: 409.15 (70322255135/171873851) main thread: emc hits:0 megaflow hits:0 avg. subtable lookups per hit:0.00 miss:0 lost:0 polling cycles:1083740 (100.00%) processing cycles:0 (0.00%) pmd thread numa_id 0 core_id 18: emc hits:171883339 megaflow hits:30 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:41280353120 (36.97%) processing cycles:70386707637 (63.03%) avg cycles per packet: 649.67 (111667060757/171883400) avg processing cycles per packet: 409.50 (70386707637/171883400) pmd thread numa_id 0 core_id 20: emc hits:168094517 megaflow hits:29 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:42303513955 (37.50%) processing cycles:70494304966 (62.50%) avg cycles per packet: 671.04 (112797818921/168094576) avg processing cycles per packet: 419.37 (70494304966/168094576) pmd thread numa_id 0 core_id 40: emc hits:168090357 megaflow hits:30 avg. subtable lookups per hit:1.00 miss:1 lost:0 polling cycles:42078529397 (37.34%) processing cycles:70616453569 (62.66%) avg cycles per packet: 670.44 (112694982966/168090418) avg processing cycles per packet: 420.11 (70616453569/168090418) Average port 1: 8466621.00 pps Average port 0: 8275347.00 pps From this testing I am going to conclude for now that this is related to VSPerf testing which could be a framework issue or something in our scripts. I will need to isolate the issue if it exists there. I am closing this bug for now and will reopen if needed. |
Created attachment 1238855 [details] cmdline steps Description of problem: Fast datapath beta channel latest version of OVS 2.6.1 and DPDK 16.11 has decreased performance from OVS 2.5-git 22 and DPDK 16.07. The following decreases in performances are noticed with a single guest configuration. pvp_tput ovs 2.5 git22 ovs2.6.1 git 1611 increase/decrease rate CI baseline value 1queue 2pmd: 5.1mpps 4.7mpps decrease 7% 4 mpps 1queue 4pmd: 9.4mpps 7.8mpps decrease 17% 7.8 mpps 2queue 4pmd: 10mpps 9.5mpps decrease 5% 8 mpps 2queue 8pmd: 18mpps 14.8mpps decrease 17% 14 mpps 4queue 8pmd: 19.5mpps 17.5mpps decrease 10% 16 mpps Version-Release number of selected component (if applicable): kernel- 3.10.0-514.2.2.el7.x86_64 dpdk-16.11-2.el7fdb.x86_64.rpm dpdk-tools-16.11-2.el7fdb.x86_64.rpm openvswitch-2.6.1-3.git20161206.el7fdb.x86_64.rpm How reproducible: Always reproducible Steps to Reproduce: Setup OVS netdev bridge with 2 DPDK and 2 VHOSTUSER ports. Intel 520 10 gig Nics are bound to VFIO-PCI driver. Setup 7.3 guest using DPDK 16.11 testpmd loopback with VFIO-PCI with no IOMMU. Attachment will include all cmdline steps for testcase. Generate 64 byte traffic with 1024k flows. Actual results: Performance has decreased from previous version. Expected results: Performance should match or exceed previous version. Additional info: Testcases use OPNFV project VSPerf to run testcases. I have also manually verified this outside of VSPerf.