Bug 2196789 - mlx_5_core/ice driver: ovs dpdk pvp cross numa case got lower performance than ovs dpdk same numa case
Summary: mlx_5_core/ice driver: ovs dpdk pvp cross numa case got lower performance th...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch3.1
Version: FDP 23.C
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: ---
Assignee: Kevin Traynor
QA Contact: ovs-qe
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-05-10 09:00 UTC by liting
Modified: 2023-07-06 03:34 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker FD-2881 0 None None None 2023-05-18 01:18:49 UTC

Description liting 2023-05-10 09:00:28 UTC
Description of problem:


Version-Release number of selected component (if applicable):
kernel 5.14.0-284.11.1.el9_2.x86_64
openvswitch3.1-3.1.0-14.el9fdp.x86_64

How reproducible:


Steps to Reproduce:
Run ovs dpdk pvp cross numa case and same numa case, 1q2pmd, 1q4pmd, 2q4pmd, 4q8pmd case, such as 4q 8pmd case
/proc/cmdline
BOOT_IMAGE=(hd0,msdos1)/vmlinuz-5.14.0-284.11.1.el9_2.x86_64 root=/dev/mapper/rhel_dell--per730--56-root ro crashkernel=1G-4G:192M,4G-64G:256M,64G-:512M resume=/dev/mapper/rhel_dell--per730--56-swap rd.lvm.lv=rhel_dell-per730-56/root rd.lvm.lv=rhel_dell-per730-56/swap console=ttyS0,115200n81 skew_tick=1 nohz=on nohz_full=2,26,4,28,6,30,8,32,10,34,12,36,14,38,16,40,18,42,20,44,22,46 rcu_nocbs=2,26,4,28,6,30,8,32,10,34,12,36,14,38,16,40,18,42,20,44,22,46 tuned.non_isolcpus=0000aaaa,abaaaaab intel_pstate=disable nosoftlockup default_hugepagesz=1G hugepagesz=1G hugepages=24 isolcpus=2,26,4,28,6,30,8,32,10,34,12,36,14,38,16,40,18,42,20,44,22,46 intel_iommu=on iommu=pt intel_idle.max_cstate=0 processor.max_cstate=0 intel_pstate=disable pci=realloc

1.Build ovs dpdk pvp topo(4q 8pmd), and pmd use cpu on numa0
b80b000f-20dc-4917-bce1-73bd607afe39
    Bridge ovsbr0
        datapath_type: netdev
        Port dpdk0
            Interface dpdk0
                type: dpdk
                options: {dpdk-devargs="0000:07:00.0", n_rxq="4", n_rxq_desc="1024", n_txq_desc="1024"}
        Port vhost0
            Interface vhost0
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser/vhost0"}
        Port dpdk1
            Interface dpdk1
                type: dpdk
                options: {dpdk-devargs="0000:07:00.1", n_rxq="4", n_rxq_desc="1024", n_txq_desc="1024"}
        Port ovsbr0
            Interface ovsbr0
                type: internal
        Port vhost1
            Interface vhost1
                type: dpdkvhostuserclient
                options: {vhost-server-path="/tmp/vhostuser/vhost1"}
    ovs_version: "3.1.1"
ovs config:
{dpdk-init="true", dpdk-lcore-mask="0x1", dpdk-socket-mem="4096", pmd-cpu-mask="550000550000", userspace-tso-enable="false", vhost-iommu-support="true"}

For guest cpu on same numa:
  <cputune>
    <vcpupin vcpu='0' cpuset='2'/>
    <vcpupin vcpu='1' cpuset='26'/>
    <vcpupin vcpu='2' cpuset='4'/>
    <vcpupin vcpu='3' cpuset='28'/>
    <vcpupin vcpu='4' cpuset='6'/>
    <vcpupin vcpu='5' cpuset='30'/>
    <vcpupin vcpu='6' cpuset='8'/>
    <vcpupin vcpu='7' cpuset='32'/>
    <vcpupin vcpu='8' cpuset='10'/>
    <emulatorpin cpuset='0,24'/>
  </cputune>

For guest cpu on cross numa:
  <vcpu placement='static'>9</vcpu>
  <cputune>
    <vcpupin vcpu='0' cpuset='3'/>
    <vcpupin vcpu='1' cpuset='5'/>
    <vcpupin vcpu='2' cpuset='7'/>
    <vcpupin vcpu='3' cpuset='9'/>
    <vcpupin vcpu='4' cpuset='11'/>
    <vcpupin vcpu='5' cpuset='13'/>
    <vcpupin vcpu='6' cpuset='15'/>
    <vcpupin vcpu='7' cpuset='17'/>
    <vcpupin vcpu='8' cpuset='19'/>
    <emulatorpin cpuset='0,24'/>
  </cputune>

2. start testpmd inside guest
dpdk-testpmd -l 0-8 -n 1 --socket-mem 1024 -- -i --forward-mode=io --burst=32 --rxd=8192 --txd=8192 --max-pkt-len=9600 --mbuf-size=9728 --nb-cores=8 --rxq=4 --txq=4 --mbcache=512  --auto-start

3. send traffic with T-rex sender
./binary-search.py --traffic-generator=trex-txrx --frame-size=64 --num-flows=1024 --max-loss-pct=0 --search-runtime=10 --validation-runtime=60 --rate-tolerance=10 --runtime-tolerance=10 --rate=25 --rate-unit=% --duplicate-packet-failure=retry-to-fail --negative-packet-loss=retry-to-fail --warmup-trial --warmup-trial-runtime=10 --rate=25 --rate-unit=% --one-shot=0 --use-src-ip-flows=1 --use-dst-ip-flows=1 --use-src-mac-flows=1 --use-dst-mac-flows=1 --send-teaching-measurement --send-teaching-warmup --teaching-warmup-packet-type=generic --teaching-warmup-packet-rate=10000 --use-src-ip-flows=1 --use-dst-ip-flows=1 --use-src-mac-flows=1 --use-dst-mac-flows=0 --use-device-stats

Actual results:
25g cx6 dx 4q 64byte cross numa and same numa:
cross numa viommu case: 12.5mpps
cross numa noviommu case: 12.8mpps
same numa viommu case: 14.8mpps
same numa noviommu case: 14.2mpps
https://beaker.engineering.redhat.com/jobs/7831950
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78319/7831950/13873005/159923234/mlx5_25.html

cx6 dx 4q 64byte cross numa and same numa :
cross numa viommu case: 11.5mpps
cross numa noviommu case: 10.5mpps
same numa viommu case: 13mpps
same numa noviommu case: 16.2mpps
https://beaker.engineering.redhat.com/jobs/7829103
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78291/7829103/13868832/159892570/mlx5_25.html

cx6 dx 4q viommu case cross numa and same numa case:
cross numa: 12.5mpps
same numa: 14.2mpps
https://beaker.engineering.redhat.com/jobs/7828758
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78287/7828758/13868369/159889510/mlx5_25.html

bf2 4q viommu cross numa and same numa case:
cross numa: 13mpps
same numa: 16mpps
https://beaker.engineering.redhat.com/jobs/7828713
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78287/7828713/13868295/159888908/mlx5_25.html

bf2 64byte noviommu case cross numa compare same numa case:
cross numa:
1q 2pmd: 3.4mpps
1q 4pmd: 7.5mpps
2q 4pmd: 8.9mpps
4q 8pmd: 15.1mpps
https://beaker.engineering.redhat.com/jobs/7827978
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78279/7827978/13867467/159885708/mlx5_25.html

same numa:
1q 2pmd: 3.6mpps
1q 4pmd: 7.5mpps
2q 4pmd: 9.9mpps
4q 8pmd: 16.6mpps
https://beaker.engineering.redhat.com/jobs/7827979
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78279/7827979/13867469/159885712/mlx5_25.html


Expected results:
The 4q 8pmd cases cross numa should slightly lower than same numa case. But the 4q 8pmd cases cross numa lower than same numa case.

Additional info:

Comment 1 liting 2023-05-18 01:43:22 UTC
This issue also exist on ice driver. 4q 8pmd same numa case got 14.2mpps and 17mpps, and cross numa case got 10.8mpps and 11.8mpps. 
same numa case:
64byte 1q 2pmd noviommu vlan case: 3.1mpps
64byte 1q 4pmd noviommu vlan case: 6.8mpps
64byte 2q 4pmd noviommu vlan case: 7.8mpps
64byte 4q 8pmd noviommu vlan case: 14.2mpps
64byte 1q 2pmd viommu novlan case: 4mpps
64byte 1q 4pmd viommu novlan case: 8.2mpps
64byte 2q 4pmd viommu novlan case: 7.8mpps
64byte 4q 8pmd viommu novlan case: 17mpps
https://beaker.engineering.redhat.com/jobs/7851028
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78510/7851028/13903404/160162049/ice_25.html

cross numa case:
64byte 1q 2pmd noviommu vlan case: 3.7mpps
64byte 1q 4pmd noviommu vlan case: 6.5mpps
64byte 2q 4pmd noviommu vlan case: 6.9mpps
64byte 4q 8pmd noviommu vlan case: 10.8mpps
64byte 1q 2pmd viommu novlan case: 4.2mpps
64byte 1q 4pmd viommu novlan case: 7.8mpps
64byte 2q 4pmd viommu novlan case: 7.9mpps
64byte 4q 8pmd viommu novlan case: 11.8mpps
https://beaker.engineering.redhat.com/jobs/7856225
https://beaker-archive.hosts.prod.psi.bos.redhat.com/beaker-logs/2023/05/78562/7856225/13910888/160210038/ice_25.html


Note You need to log in before you can comment on or make changes to this bug.