Bug 1710357

Summary: cx5: poor ovs hw offload performance
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Amit Supugade <asupugad>
Component: openvswitchAssignee: Alaa Hleihel (NVIDIA Mellanox) <ahleihel>
openvswitch sub component: ovs-hw-offload QA Contact: Amit Supugade <asupugad>
Status: CLOSED NOTABUG Docs Contact:
Severity: high    
Priority: high CC: ahleihel, atragler, ctrautma, fbaudin, mleitner, qding, rkhan
Version: FDP 19.CKeywords: Regression, TestBlocker
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-06-28 17:11:01 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Amit Supugade 2019-05-15 12:07:19 UTC
Description of problem:
Performance with ovs hw offload is very less than expected performance.

Version-Release number of selected component (if applicable):
openvswitch-2.9.0-106.el7fdp.x86_64.rpm

How reproducible:
Always

Steps to Reproduce:
1. Add 2 VFs
2. Add 2 VFs and PF to ovsbr. 
3. Add flows that route traffic from PF to VF1 and VF2 to PF
4. Setup testpmd on VM to loop back traffic from iface1(VF1) to iface2(VF2)
5. Run traffic with ovs offload enabled

Actual results:
For 100G: 8456670

Expected results:
Performance should be as per PFT pass fail criteria.
For 100G: 26899051

Additional info:
Job links:
https://beaker.engineering.redhat.com/jobs/3535655
https://beaker.engineering.redhat.com/jobs/3535656

Comment 1 Amit Supugade 2019-05-16 14:49:51 UTC
Hi, 
Performance on 19.B, for 100G CX5- 30628852
Job link- https://beaker.engineering.redhat.com/jobs/3448569

Comment 3 Christian Trautman 2019-05-17 14:15:49 UTC
Issue is also present in 2.11

Comment 4 Christian Trautman 2019-05-17 14:17:22 UTC
2.11 job link https://beaker.engineering.redhat.com/jobs/3535660

8456670 fps

Comment 5 Marcelo Ricardo Leitner 2019-05-17 14:23:30 UTC
Interesting that the numbers are still higher than sw datapath, and the skip_sw tests prove that.

What about kernel version, was is the same on the tests with 19.B and 19.C ?

Comment 6 Amit Supugade 2019-05-17 18:15:39 UTC
Kernel Info-
19.B on 3.10.0-957.10.1.el7.x86_64
19.C on 3.10.0-957.el7.x86_64
I am trying out few more combinations of kernel and ovs to see what results I get.

Comment 7 Marcelo Ricardo Leitner 2019-05-17 18:58:33 UTC
(In reply to Amit Supugade from comment #6)
> Kernel Info-
> 19.B on 3.10.0-957.10.1.el7.x86_64
> 19.C on 3.10.0-957.el7.x86_64

Or the other way around? Or 19.C was really tested with an older kernel?

> I am trying out few more combinations of kernel and ovs to see what results
> I get.

Cool, thanks.

Comment 8 Amit Supugade 2019-05-18 13:49:06 UTC
3.10.0-957.el7.x86_64 is 7.6 GA kernel so I ran tests on 19C with this kernel.
Additional results:
19B on 3.10.0-957.el7.x86_64 = Pass
19C on 3.10.0-957.10.1.el7.x86_64 = Fail

So basically 19B passed and 19C failed on both the above kernels.

Comment 9 Alaa Hleihel (NVIDIA Mellanox) 2019-05-19 10:08:55 UTC
So using either kernel; OVS 19B always passes and OVS 19C always fails ?
meaning the issue happens only when switching between openvswitch versions regardless of the kernel version ?

Just trying to understand on which component we should focus the debug.

Also, please provide the exact RPM version numbers (for 19B and 19C), I am not sure how to map between them :)

Thanks
Alaa

Comment 10 Amit Supugade 2019-05-20 12:49:02 UTC
(In reply to Alaa Hleihel from comment #9)
> So using either kernel; OVS 19B always passes and OVS 19C always fails ?
> meaning the issue happens only when switching between openvswitch versions
> regardless of the kernel version ?
correct
> 
> Just trying to understand on which component we should focus the debug.
> 
> Also, please provide the exact RPM version numbers (for 19B and 19C), I am
> not sure how to map between them :)
> 
19B- openvswitch-2.9.0-101.el7fdp.x86_64.rpm
19C- openvswitch-2.9.0-106.el7fdp.x86_64.rpm

> Thanks
> Alaa

Comment 11 Rashid Khan 2019-05-21 14:11:03 UTC
Hi Alaa, any updates?

Comment 12 Marcelo Ricardo Leitner 2019-05-21 15:21:14 UTC
Hi Amit, there are only 5 versions in there. Can you please try to bisect it to a specific release?

 %changelog
+* Thu Apr 18 2019 Lorenzo Bianconi <lorenzo.bianconi> - 2.9.0-106
+- Backport "OVN: fix DVR Floating IP support" (#1671776)
+
+* Tue Apr 09 2019 Timothy Redaelli <tredaelli> - 2.9.0-105
+- Fix missing dependencies for ovs-tcpdump (#1651232)
+
+* Fri Apr 05 2019 Timothy Redaelli <tredaelli> - 2.9.0-104
+- Add "Obsoletes: python-openvswitch < 2.9.0-57" to avoid yum to fail
+  on openstack during upgrade from .noarch to .arch (#1696340)
+
+* Tue Mar 26 2019 Numan Siddique <nusiddiq> - 2.9.0-103
+- Backport fixes for #1677616 (pinctrl thread) and fixes related to IPv6 RA.
+
+* Tue Mar 26 2019 Jakub Libosvar <libosvar> - 2.9.0-102
+- Backport "Add unixctl option for ovn-northd" (#1687480)
+
 * Thu Mar 14 2019 Timothy Redaelli <tredaelli> - 2.9.0-101

Comment 13 Alaa Hleihel (NVIDIA Mellanox) 2019-05-21 15:31:25 UTC
(In reply to Marcelo Ricardo Leitner from comment #12)
> Hi Amit, there are only 5 versions in there. Can you please try to bisect it
> to a specific release?
> 
>  %changelog
> +* Thu Apr 18 2019 Lorenzo Bianconi <lorenzo.bianconi> - 2.9.0-106
> +- Backport "OVN: fix DVR Floating IP support" (#1671776)
> +
> +* Tue Apr 09 2019 Timothy Redaelli <tredaelli> - 2.9.0-105
> +- Fix missing dependencies for ovs-tcpdump (#1651232)
> +
> +* Fri Apr 05 2019 Timothy Redaelli <tredaelli> - 2.9.0-104
> +- Add "Obsoletes: python-openvswitch < 2.9.0-57" to avoid yum to fail
> +  on openstack during upgrade from .noarch to .arch (#1696340)
> +
> +* Tue Mar 26 2019 Numan Siddique <nusiddiq> - 2.9.0-103
> +- Backport fixes for #1677616 (pinctrl thread) and fixes related to IPv6 RA.
> +
> +* Tue Mar 26 2019 Jakub Libosvar <libosvar> - 2.9.0-102
> +- Backport "Add unixctl option for ovn-northd" (#1687480)
> +
>  * Thu Mar 14 2019 Timothy Redaelli <tredaelli> - 2.9.0-101

yes, please, that will be great.

I wanted to check the diff between the versions but had to jump back to another issue..

from a quick look i am not sure how these patches can affect mlx5 only, are other vendors not affected ?

$ diff -ru 19B 19C | diffstat
 BUILD/openvswitch-2.9.0/lib/automake.mk                                 |    3 
 BUILD/openvswitch-2.9.0/lib/automake.mk.orig                            |only
 BUILD/openvswitch-2.9.0/lib/packets.h                                   |    3 
 BUILD/openvswitch-2.9.0/lib/unixctl.xml                                 |only
 BUILD/openvswitch-2.9.0/ovn/controller/pinctrl.c                        |  731 +++++++---
 BUILD/openvswitch-2.9.0/ovn/lib/actions.c                               |   20 
 BUILD/openvswitch-2.9.0/ovn/lib/ovn-l7.h                                |    3 
 BUILD/openvswitch-2.9.0/ovn/northd/ovn-northd.8.xml                     |   58 
 BUILD/openvswitch-2.9.0/ovn/northd/ovn-northd.c                         |  125 +
 BUILD/openvswitch-2.9.0/ovn/northd/ovn-northd.c.orig                    |  199 ++
 BUILD/openvswitch-2.9.0/tests/ovn-northd.at                             |   39 
 BUILD/openvswitch-2.9.0/tests/ovn.at                                    |   10 
 SOURCES/.0001-Add-unixctl-option-for-ovn-northd.patch.swp               |only
 SOURCES/0001-Add-unixctl-option-for-ovn-northd.patch                    |only
 SOURCES/0001-OVN-fix-DVR-Floating-IP-support.patch                      |only
 SOURCES/0001-ovn-pinctrl-Pass-struct-rconn-swconn-to-all-the-func.patch |only
 SOURCES/0002-ovn-controller-Add-a-new-thread-in-pinctrl-module-to.patch |only
 SOURCES/0003-OVN-Use-offset-instead-of-pointer-into-ofpbuf.patch        |only
 SOURCES/0004-OVN-Always-send-prefix-option-in-RAs.patch                 |only
 SOURCES/0005-OVN-Make-periodic-RAs-consistent-with-RA-responder.patch   |only
 SPECS/openvswitch.spec                                                  |   35 
 21 files changed, 1016 insertions(+), 210 deletions(-)

Comment 15 Amit Supugade 2019-05-22 13:34:14 UTC
Hi, 
Based on results of tests I ran, it looks like we are getting low performance only if I attach VF to VM using 'virsh attach-device <VM_NAME> <VF.xml>'. If I define VM using xml, the performance is as expected.
19C with VM defined from xml- https://beaker.engineering.redhat.com/jobs/3551606
If I use virsh attach-device, test also fails on netronome. I am running more tests on netronome. Will update the results here. Thanks!

Comment 16 Alaa Hleihel (NVIDIA Mellanox) 2019-05-22 14:23:23 UTC
(In reply to Amit Supugade from comment #15)
> Hi, 
> Based on results of tests I ran, it looks like we are getting low
> performance only if I attach VF to VM using 'virsh attach-device <VM_NAME>
> <VF.xml>'. If I define VM using xml, the performance is as expected.
> 19C with VM defined from xml-
> https://beaker.engineering.redhat.com/jobs/3551606
> If I use virsh attach-device, test also fails on netronome. I am running
> more tests on netronome. Will update the results here. Thanks!

Thanks a lot for the update Amit!

This is interesting..
Is there also a different between outputs of "virsh dumpxml <VM_NAME>" after attaching the VF using the 2 methods ?

Comment 18 Amit Supugade 2019-05-28 18:28:11 UTC
Hi, 
Below is the difference between VM xmls. 
mlx_master.xml => Gives expected performance.

[root@netqe28 ~]# diff mlx_master.xml attach.xml 
1c1
< <domain type='kvm' id='3'>
---
> <domain type='kvm' id='1'>
3,11c3,5
<   <uuid>5fcb42cc-eb08-475a-b574-6f8aaa10bdd0</uuid>
<   <memory unit='KiB'>4194304</memory>
<   <currentMemory unit='KiB'>4194304</currentMemory>
<   <memoryBacking>
<     <hugepages>
<       <page size='1048576' unit='KiB' nodeset='0'/>
<     </hugepages>
<     <access mode='shared'/>
<   </memoryBacking>
---
>   <uuid>5318f61f-863a-4345-89d1-0f5d543a0042</uuid>
>   <memory unit='KiB'>8388608</memory>
>   <currentMemory unit='KiB'>8388608</currentMemory>
13,18d6
<   <cputune>
<     <vcpupin vcpu='0' cpuset='4'/>
<     <vcpupin vcpu='1' cpuset='6'/>
<     <vcpupin vcpu='2' cpuset='14'/>
<     <emulatorpin cpuset='4'/>
<   </cputune>
30,34c18,28
<   <cpu mode='host-passthrough'>
<     <feature policy='require' name='tsc-deadline'/>
<     <numa>
<       <cell id='0' cpus='0-2' memory='4194304' unit='KiB' memAccess='shared'/>
<     </numa>
---
>   <cpu mode='custom' match='exact' check='full'>
>     <model fallback='forbid'>Skylake-Client-IBRS</model>
>     <feature policy='require' name='avx512f'/>
>     <feature policy='require' name='avx512dq'/>
>     <feature policy='require' name='clwb'/>
>     <feature policy='require' name='avx512cd'/>
>     <feature policy='require' name='avx512bw'/>
>     <feature policy='require' name='avx512vl'/>
>     <feature policy='require' name='pdpe1gb'/>
>     <feature policy='require' name='hypervisor'/>
>     <feature policy='disable' name='arat'/>
85c79
<       <mac address='52:54:00:8a:78:09'/>
---
>       <mac address='52:54:00:3d:36:df'/>
90c84
<       <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
---
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
99c93
<       <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
---
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x10' function='0x0'/>
108c102
<       <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
---
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x11' function='0x0'/>
130c124
<       <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-3-master/org.qemu.guest_agent.0'/>
---
>       <source mode='bind' path='/var/lib/libvirt/qemu/channel/target/domain-1-master/org.qemu.guest_agent.0'/>
143c137
<       <address type='pci' domain='0x0000' bus='0x00' slot='0x03' function='0x0'/>
---
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x07' function='0x0'/>
144a139,143
>     <rng model='virtio'>
>       <backend model='random'>/dev/urandom</backend>
>       <alias name='rng0'/>
>       <address type='pci' domain='0x0000' bus='0x00' slot='0x08' function='0x0'/>
>     </rng>
147,148c146,147
<     <label>system_u:system_r:svirt_t:s0:c718,c898</label>
<     <imagelabel>system_u:object_r:svirt_image_t:s0:c718,c898</imagelabel>
---
>     <label>system_u:system_r:svirt_t:s0:c569,c1017</label>
>     <imagelabel>system_u:object_r:svirt_image_t:s0:c569,c1017</imagelabel>
[root@netqe28 ~]#

Comment 19 Christian Trautman 2019-05-28 18:48:55 UTC
If I'm reading this correctly the attach.xml doesn't have the correct tunings in the xml which the mlx_master.xml does.

<   <cputune>
<     <vcpupin vcpu='0' cpuset='4'/>
<     <vcpupin vcpu='1' cpuset='6'/>
<     <vcpupin vcpu='2' cpuset='14'/>
<     <emulatorpin cpuset='4'/>
<   </cputune>

If this is correct that would explain the performance difference.  This would explain why the performance has dropped as the VM isn't being properly isolated and would be having context switches all over its virtual CPUs.

There are also other issues such as hugepages and cpu mode that would cause problems as well.  

I don't think this is a bug honestly and more of a need to do other steps after using the virsh attach method.

Comment 20 Amit Supugade 2019-06-28 17:11:01 UTC
Performance is as expected with correct tuning in the xml. Closing the bug.