Bug 2188899 - [nfv virt][pvp][cross numa] The vm's vhostuser interface throughput drops significantly after adding emulatorpin cfg [NEEDINFO]
Summary: [nfv virt][pvp][cross numa] The vm's vhostuser interface throughput drops sig...
Keywords:
Status: NEW
Alias: None
Product: Red Hat Enterprise Linux 9
Classification: Red Hat
Component: qemu-kvm
Version: 9.3
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Virtualization Maintenance
QA Contact: Yanghang Liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-23 08:02 UTC by Yanghang Liu
Modified: 2023-08-15 07:09 UTC (History)
10 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: ---
Target Upstream Version:
Embargoed:
yanghliu: needinfo? (lvivier)


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHELPLAN-155452 0 None None None 2023-04-23 08:04:00 UTC

Description Yanghang Liu 2023-04-23 08:02:27 UTC
Description of problem:
 The vm's vhostuser interface throughput drops significantly after adding emulatorpin cfg 

Version-Release number of selected component (if applicable):
5.14.0-301.el9.x86_64
qemu-kvm-7.2.0-14.el9_2.x86_64
libvirt-9.2.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. setup the host kernel option, like CPU isolation,huge-page, iommu 

# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` 
# echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11"  >> /etc/tuned/cpu-partitioning-variables.conf  
tuned-adm profile cpu-partitioning
# reboot

2. start a dpdk-testpmd on the host

# echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind.py --bind=vfio-pci 0000:5e:00.0
# dpdk-devbind.py --bind=vfio-pci 0000:5e:00.1
# dpdk-devbind.py --bind=vfio-pci 0000:60:00.0
# dpdk-testpmd -l 15,31,29,27,25,23,21,19,17 --socket-mem 1024,1024 -n 4  --vdev 'net_vhost0,iface=/tmp/vhost-user1,queues=2,client=1,iommu-support=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,queues=2,client=1,iommu-support=1'  -b 0000:3b:00.0 -b 0000:3b:00.1  -d /usr/lib64/librte_net_vhost.so  -- --portmask=f -i --rxd=512 --txd=512 --rxq=2 --txq=2 --nb-cores=8 --forward-mode=io
   testpmd> set portlist 0,2,1,3
   testpmd> start

3. start a domain with vhost-user interfaces and <emulatorpin cpuset='25,27,29,31'/> 

<cputune>
  <vcpupin vcpu='0' cpuset='30'/>
  <vcpupin vcpu='1' cpuset='28'/>
  <vcpupin vcpu='2' cpuset='26'/>
  <vcpupin vcpu='3' cpuset='24'/>
  <vcpupin vcpu='4' cpuset='22'/>
  <vcpupin vcpu='5' cpuset='20'/>
  <emulatorpin cpuset='25,27,29,31'/>
</cputune>
...
<interface type='vhostuser'>
  <mac address='88:66:da:5f:dd:12'/>
  <source type='unix' path='/tmp/vhost-user1' mode='server'/>
  <model type='virtio'/>
  <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
</interface>
<interface type='vhostuser'>
  <mac address='88:66:da:5f:dd:13'/>
  <source type='unix' path='/tmp/vhost-user2' mode='server'/>
  <model type='virtio'/>
  <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
</interface>

Note : the full domain xml is in the test log

4. setup the kernel option in the domain
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` 	
# echo "isolated_cores=1,2,3,4,5"  >> /etc/tuned/cpu-partitioning-variables.conf 
# tuned-adm profile cpu-partitioning
# reboot

5. start a dpdk-testpmd in the domain 
# echo 2 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind.py --bind=vfio-pci 0000:06:00.0
# dpdk-devbind.py --bind=vfio-pci 0000:07:00.0
# dpdk-testpmd -l 1,2,3,4,5 -n 4  -d /usr/lib64/librte_net_virtio.so  -- --nb-cores=4 -i --disable-rss --rxd=512 --txd=512 --rxq=2 --txq=2 
  testpmd> start

6. do Moongen tests
# ./build/MoonGen examples/opnfv-vsperf.lua > /tmp/throughput.log

7. check the Throughput

****************************************************
Packets_loss Frame_Size(Byte) Run_No     Throughput(Mpps)
           0               64      0      3.034078
****************************************************

8. repeat the above step 1- step 7, but without <emulatorpin cpuset='25,27,29,31'/> 

****************************************************
Packets_loss Frame_Size(Byte)   Run_No   Throughput(Mpps)
           0               64      0       21.127439
****************************************************

Actual results:
 The vm's vhostuser interface throughput drops around 85% after adding emulatorpin cfg

Expected results:
  No significant throughput drops

Additional info:
(1) The detailed test log with  emulatorpin cfg
http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_with_emulatorpin

(2) The detailed test log without  emulatorpin cfg
http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_without_emulatorpin

(3) related bug about emulatorpin xml
Bug 2154750 - [numatune][cputune] qemu-kvm: Setting CPU affinity failed: Invalid argument
Bug 2185039 - [numatune][cputune] qemu-kvm: Setting CPU affinity failed: Invalid argument [rhel-9.2.0.z]

Comment 1 Laurent Vivier 2023-04-26 06:59:36 UTC
Michal,

as you worked on related bug, is the configuration used in this BZ valid?
IS the performance drop expected?

Comment 2 Michal Privoznik 2023-04-26 14:44:44 UTC
I don't think it is expected. The linked bug(s) are about ThreadContext, i.e. how QEMU allocates the memory. The emulator thread isn't affected.

Yanghang, can you please share the QEMU cmd line in both cases? Also, what is the CPU topology? I'm wondering whether those CPU ids from <emulatorpin/> aren't just another CPU thread to those in <vcpupin/>, in which case the emulator thread can't run really if a vCPU is running. And maybe without <emulatorpin/> kernel is free to schedule the emulator thread onto a different core.

Comment 3 Yanghang Liu 2023-05-09 08:10:05 UTC
(In reply to Michal Privoznik from comment #2)
> I don't think it is expected. The linked bug(s) are about ThreadContext,
> i.e. how QEMU allocates the memory. The emulator thread isn't affected.
> 
> Yanghang, can you please share the QEMU cmd line in both cases? Also, what
> is the CPU topology? I'm wondering whether those CPU ids from <emulatorpin/>
> aren't just another CPU thread to those in <vcpupin/>, in which case the
> emulator thread can't run really if a vCPU is running. And maybe without
> <emulatorpin/> kernel is free to schedule the emulator thread onto a
> different core.

Hi Michal,

Thanks for the confirmation.  I have listed the related info in Comment 0, please let me know if I need to provide more info.

We can get the detailed test log as well as the full domain xml from: 

  (1) The detailed test log with  emulatorpin cfg
  http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_with_emulatorpin

  (2) The detailed test log without  emulatorpin cfg
  http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_without_emulatorpin

And the domain's CPU topology is like:

   <cputune>
    <vcpupin vcpu='0' cpuset='30'/>
    <vcpupin vcpu='1' cpuset='28'/>
    <vcpupin vcpu='2' cpuset='26'/>
    <vcpupin vcpu='3' cpuset='24'/>
    <vcpupin vcpu='4' cpuset='22'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <emulatorpin cpuset='25,27,29,31'/>   <--- I run my tests with/without this cfg.
   </cputune>

The list of host cores which dpdk-testpmd is running on is 15,31,29,27,25,23,21,19,17

The related cmd line is : 
# dpdk-testpmd -l 15,31,29,27,25,23,21,19,17 --socket-mem 1024,1024 -n 4  --vdev 'net_vhost0,iface=/tmp/vhost-user1,queues=2,client=1,iommu-support=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,queues=2,client=1,iommu-support=1'  -b 0000:3b:00.0 -b 0000:3b:00.1  -d /usr/lib64/librte_net_vhost.so  -- --portmask=f -i --rxd=512 --txd=512 --rxq=2 --txq=2 --nb-cores=8 --forward-mode=io

Comment 4 Yanghang Liu 2023-05-30 05:37:05 UTC
This issue can still be reproduced in:
  qemu-kvm-8.0.0-4.el9.x86_64
  libvirt-9.3.0-2.el9.x86_64
  5.14.0-319.el9.x86_64
  seabios-bin-1.16.1-1.el9.noarch




Check point:

  Test *with* <emulatorpin cpuset='25,27,29,31'/> cfg:
      
     Throughput(Mpps) : 3.132936

  Test *without* <emulatorpin cpuset='25,27,29,31'/> cfg:

     Throughput(Mpps) :21.127461

Comment 5 Yanghang Liu 2023-08-10 02:33:46 UTC
This issue can still be reproduced in:

      host:
           qemu-kvm-8.0.0-9.el9.x86_64
           tuned-2.20.0-1.el9.noarch
           libvirt-9.5.0-5.el9.x86_64
           openvswitch3.1-3.1.0-42.el9fdp.x86_64
           dpdk-22.11-3.el9_2.x86_64
           edk2-ovmf-20230524-2.el9.noarch
      guest:
           5.14.0-346.el9.x86_64



Test log: http://10.73.72.41/log/2023-08-07_20:17/nfv_pvp_1q_cross_numa



Check point:


[1] The statistics of dpdk-testpmd in the VM: 

+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++

RX-packets: 12822137264    RX-dropped: 494648264     RX-total: 13316785528

TX-packets: 12548106232    TX-dropped: 274031032     TX-total: 12822137264

[2] The VM Throughput(Mpps) : 2.240211

Comment 6 Yanghang Liu 2023-08-15 07:09:46 UTC
Hi Laurent,

May I ask if you could help cc some developers to look at this bug ?

From a QE point of view, we expect this BZ to be handled with priority, because this BZ has customer impact.


Note You need to log in before you can comment on or make changes to this bug.