Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2188899

Summary: [nfv virt][pvp][cross numa] The vm's vhostuser interface throughput drops significantly after adding emulatorpin cfg
Product: Red Hat Enterprise Linux 9 Reporter: Yanghang Liu <yanghliu>
Component: qemu-kvmAssignee: Virtualization Maintenance <virt-maint>
qemu-kvm sub component: Networking QA Contact: Yanghang Liu <yanghliu>
Status: CLOSED NOTABUG Docs Contact:
Severity: medium    
Priority: medium CC: chayang, coli, jinzhao, juzhang, lvivier, maxime.coquelin, mprivozn, virt-maint, yama, yanghliu
Version: 9.3Keywords: Triaged
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2023-09-14 06:41:31 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Yanghang Liu 2023-04-23 08:02:27 UTC
Description of problem:
 The vm's vhostuser interface throughput drops significantly after adding emulatorpin cfg 

Version-Release number of selected component (if applicable):
5.14.0-301.el9.x86_64
qemu-kvm-7.2.0-14.el9_2.x86_64
libvirt-9.2.0-1.el9.x86_64

How reproducible:
100%

Steps to Reproduce:
1. setup the host kernel option, like CPU isolation,huge-page, iommu 

# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` 
# echo "isolated_cores=2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,31,29,27,25,23,21,19,17,15,13,11"  >> /etc/tuned/cpu-partitioning-variables.conf  
tuned-adm profile cpu-partitioning
# reboot

2. start a dpdk-testpmd on the host

# echo 20 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# echo 20 > /sys/devices/system/node/node1/hugepages/hugepages-1048576kB/nr_hugepages
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind.py --bind=vfio-pci 0000:5e:00.0
# dpdk-devbind.py --bind=vfio-pci 0000:5e:00.1
# dpdk-devbind.py --bind=vfio-pci 0000:60:00.0
# dpdk-testpmd -l 15,31,29,27,25,23,21,19,17 --socket-mem 1024,1024 -n 4  --vdev 'net_vhost0,iface=/tmp/vhost-user1,queues=2,client=1,iommu-support=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,queues=2,client=1,iommu-support=1'  -b 0000:3b:00.0 -b 0000:3b:00.1  -d /usr/lib64/librte_net_vhost.so  -- --portmask=f -i --rxd=512 --txd=512 --rxq=2 --txq=2 --nb-cores=8 --forward-mode=io
   testpmd> set portlist 0,2,1,3
   testpmd> start

3. start a domain with vhost-user interfaces and <emulatorpin cpuset='25,27,29,31'/> 

<cputune>
  <vcpupin vcpu='0' cpuset='30'/>
  <vcpupin vcpu='1' cpuset='28'/>
  <vcpupin vcpu='2' cpuset='26'/>
  <vcpupin vcpu='3' cpuset='24'/>
  <vcpupin vcpu='4' cpuset='22'/>
  <vcpupin vcpu='5' cpuset='20'/>
  <emulatorpin cpuset='25,27,29,31'/>
</cputune>
...
<interface type='vhostuser'>
  <mac address='88:66:da:5f:dd:12'/>
  <source type='unix' path='/tmp/vhost-user1' mode='server'/>
  <model type='virtio'/>
  <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
</interface>
<interface type='vhostuser'>
  <mac address='88:66:da:5f:dd:13'/>
  <source type='unix' path='/tmp/vhost-user2' mode='server'/>
  <model type='virtio'/>
  <driver name='vhost' queues='2' rx_queue_size='1024' iommu='on' ats='on'/>
</interface>

Note : the full domain xml is in the test log

4. setup the kernel option in the domain
# grubby --args="iommu=pt intel_iommu=on default_hugepagesz=1G" --update-kernel=`grubby --default-kernel` 	
# echo "isolated_cores=1,2,3,4,5"  >> /etc/tuned/cpu-partitioning-variables.conf 
# tuned-adm profile cpu-partitioning
# reboot

5. start a dpdk-testpmd in the domain 
# echo 2 > /sys/devices/system/node/node0/hugepages/hugepages-1048576kB/nr_hugepages
# modprobe vfio
# modprobe vfio-pci
# dpdk-devbind.py --bind=vfio-pci 0000:06:00.0
# dpdk-devbind.py --bind=vfio-pci 0000:07:00.0
# dpdk-testpmd -l 1,2,3,4,5 -n 4  -d /usr/lib64/librte_net_virtio.so  -- --nb-cores=4 -i --disable-rss --rxd=512 --txd=512 --rxq=2 --txq=2 
  testpmd> start

6. do Moongen tests
# ./build/MoonGen examples/opnfv-vsperf.lua > /tmp/throughput.log

7. check the Throughput

****************************************************
Packets_loss Frame_Size(Byte) Run_No     Throughput(Mpps)
           0               64      0      3.034078
****************************************************

8. repeat the above step 1- step 7, but without <emulatorpin cpuset='25,27,29,31'/> 

****************************************************
Packets_loss Frame_Size(Byte)   Run_No   Throughput(Mpps)
           0               64      0       21.127439
****************************************************

Actual results:
 The vm's vhostuser interface throughput drops around 85% after adding emulatorpin cfg

Expected results:
  No significant throughput drops

Additional info:
(1) The detailed test log with  emulatorpin cfg
http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_with_emulatorpin

(2) The detailed test log without  emulatorpin cfg
http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_without_emulatorpin

(3) related bug about emulatorpin xml
Bug 2154750 - [numatune][cputune] qemu-kvm: Setting CPU affinity failed: Invalid argument
Bug 2185039 - [numatune][cputune] qemu-kvm: Setting CPU affinity failed: Invalid argument [rhel-9.2.0.z]

Comment 1 Laurent Vivier 2023-04-26 06:59:36 UTC
Michal,

as you worked on related bug, is the configuration used in this BZ valid?
IS the performance drop expected?

Comment 2 Michal Privoznik 2023-04-26 14:44:44 UTC
I don't think it is expected. The linked bug(s) are about ThreadContext, i.e. how QEMU allocates the memory. The emulator thread isn't affected.

Yanghang, can you please share the QEMU cmd line in both cases? Also, what is the CPU topology? I'm wondering whether those CPU ids from <emulatorpin/> aren't just another CPU thread to those in <vcpupin/>, in which case the emulator thread can't run really if a vCPU is running. And maybe without <emulatorpin/> kernel is free to schedule the emulator thread onto a different core.

Comment 3 Yanghang Liu 2023-05-09 08:10:05 UTC
(In reply to Michal Privoznik from comment #2)
> I don't think it is expected. The linked bug(s) are about ThreadContext,
> i.e. how QEMU allocates the memory. The emulator thread isn't affected.
> 
> Yanghang, can you please share the QEMU cmd line in both cases? Also, what
> is the CPU topology? I'm wondering whether those CPU ids from <emulatorpin/>
> aren't just another CPU thread to those in <vcpupin/>, in which case the
> emulator thread can't run really if a vCPU is running. And maybe without
> <emulatorpin/> kernel is free to schedule the emulator thread onto a
> different core.

Hi Michal,

Thanks for the confirmation.  I have listed the related info in Comment 0, please let me know if I need to provide more info.

We can get the detailed test log as well as the full domain xml from: 

  (1) The detailed test log with  emulatorpin cfg
  http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_with_emulatorpin

  (2) The detailed test log without  emulatorpin cfg
  http://10.73.72.41/log/2023-04-22_23:53/nfv_pvp_2q_cross_numa_without_emulatorpin

And the domain's CPU topology is like:

   <cputune>
    <vcpupin vcpu='0' cpuset='30'/>
    <vcpupin vcpu='1' cpuset='28'/>
    <vcpupin vcpu='2' cpuset='26'/>
    <vcpupin vcpu='3' cpuset='24'/>
    <vcpupin vcpu='4' cpuset='22'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <emulatorpin cpuset='25,27,29,31'/>   <--- I run my tests with/without this cfg.
   </cputune>

The list of host cores which dpdk-testpmd is running on is 15,31,29,27,25,23,21,19,17

The related cmd line is : 
# dpdk-testpmd -l 15,31,29,27,25,23,21,19,17 --socket-mem 1024,1024 -n 4  --vdev 'net_vhost0,iface=/tmp/vhost-user1,queues=2,client=1,iommu-support=1' --vdev 'net_vhost1,iface=/tmp/vhost-user2,queues=2,client=1,iommu-support=1'  -b 0000:3b:00.0 -b 0000:3b:00.1  -d /usr/lib64/librte_net_vhost.so  -- --portmask=f -i --rxd=512 --txd=512 --rxq=2 --txq=2 --nb-cores=8 --forward-mode=io

Comment 4 Yanghang Liu 2023-05-30 05:37:05 UTC
This issue can still be reproduced in:
  qemu-kvm-8.0.0-4.el9.x86_64
  libvirt-9.3.0-2.el9.x86_64
  5.14.0-319.el9.x86_64
  seabios-bin-1.16.1-1.el9.noarch




Check point:

  Test *with* <emulatorpin cpuset='25,27,29,31'/> cfg:
      
     Throughput(Mpps) : 3.132936

  Test *without* <emulatorpin cpuset='25,27,29,31'/> cfg:

     Throughput(Mpps) :21.127461

Comment 5 Yanghang Liu 2023-08-10 02:33:46 UTC
This issue can still be reproduced in:

      host:
           qemu-kvm-8.0.0-9.el9.x86_64
           tuned-2.20.0-1.el9.noarch
           libvirt-9.5.0-5.el9.x86_64
           openvswitch3.1-3.1.0-42.el9fdp.x86_64
           dpdk-22.11-3.el9_2.x86_64
           edk2-ovmf-20230524-2.el9.noarch
      guest:
           5.14.0-346.el9.x86_64



Test log: http://10.73.72.41/log/2023-08-07_20:17/nfv_pvp_1q_cross_numa



Check point:


[1] The statistics of dpdk-testpmd in the VM: 

+++++++++++++++ Accumulated forward statistics for all ports+++++++++++++++

RX-packets: 12822137264    RX-dropped: 494648264     RX-total: 13316785528

TX-packets: 12548106232    TX-dropped: 274031032     TX-total: 12822137264

[2] The VM Throughput(Mpps) : 2.240211

Comment 6 Yanghang Liu 2023-08-15 07:09:46 UTC
Hi Laurent,

May I ask if you could help cc some developers to look at this bug ?

From a QE point of view, we expect this BZ to be handled with priority, because this BZ has customer impact.

Comment 7 Laurent Vivier 2023-08-24 08:03:45 UTC
Yanghang,

what is the purpose of using emulatorpin if it drops the performance?
Why the customer wants to use it?

Could you provide the QEMU command line to reproduce the problem without libvirt?

Comment 8 Laurent Vivier 2023-08-30 09:35:19 UTC
Could you also provide the result of numactl -H on the host?

Comment 9 Yanghang Liu 2023-08-31 02:42:21 UTC
(In reply to Laurent Vivier from comment #8)
Hi Laurent,
 
> what is the purpose of using emulatorpin if it drops the performance?
> Why the customer wants to use it?

As far as I know, <emulatorpin> is a CPU tuning element and it can pin qemu-kvm emulator to physical CPUs.

Generally <emulatorpin> should optimize our VM's performance.

And <emulatorpin> is also a common cfg used by the OSP NFV QE.

> Could you provide the QEMU command line to reproduce the problem without libvirt?

I never try to setup cpupin and emulartorpin to in QEMU layer before as we always test nfv virt via libvirt.

Currently, I am wonder if this issue to due to the VM's busy CPU and will try with updating my test cfg (like increase the VM CPU number, pin emulator to the housekeep CPU...).


The related qemu-kvm cmdline generated by libvirt when I reproducing this issue:
/usr/libexec/qemu-kvm \
-name guest=rhel9.3,debug-threads=on \
-S \
-object '{"qom-type":"secret","id":"masterKey0","format":"raw","file":"/var/lib/libvirt/qemu/domain-1-rhel9.3/master-key.aes"}' \
-machine pc-q35-rhel9.2.0,usb=off,vmport=off,kernel_irqchip=split,dump-guest-core=off \
-accel kvm \
-cpu Skylake-Server-IBRS,ss=on,vmx=on,pdcm=on,hypervisor=on,tsc-adjust=on,clflushopt=on,umip=on,pku=on,md-clear=on,stibp=on,arch-capabilities=on,ssbd=on,xsaves=on,ibpb=on,ibrs=on,amd-stibp=on,amd-ssbd=on,rsba=on,skip-l1dfl-vmentry=on,pschange-mc-no=on,tsc-deadline=on,pmu=off \
-m 8192 \
-overcommit mem-lock=on \
-smp 6,sockets=3,dies=1,cores=1,threads=2 \
-object '{"qom-type":"memory-backend-file","id":"ram-node0","mem-path":"/dev/hugepages/libvirt/qemu/1-rhel9.3","share":true,"prealloc":true,"size":8589934592,"host-nodes":[0],"policy":"bind"}' \
-numa node,nodeid=0,cpus=0-5,memdev=ram-node0 \
-uuid 3a17e48a-e155-11ed-bfa4-20040fec000c \
-display none \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=22,server=on,wait=off \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-boot strict=on \
-device '{"driver":"intel-iommu","id":"iommu0","intremap":"on","caching-mode":true,"device-iotlb":true}' \
-device '{"driver":"pcie-root-port","port":16,"chassis":1,"id":"pci.1","bus":"pcie.0","multifunction":true,"addr":"0x2"}' \
-device '{"driver":"pcie-root-port","port":17,"chassis":2,"id":"pci.2","bus":"pcie.0","addr":"0x2.0x1"}' \
-device '{"driver":"pcie-root-port","port":18,"chassis":3,"id":"pci.3","bus":"pcie.0","addr":"0x2.0x2"}' \
-device '{"driver":"pcie-root-port","port":19,"chassis":4,"id":"pci.4","bus":"pcie.0","addr":"0x2.0x3"}' \
-device '{"driver":"pcie-root-port","port":20,"chassis":5,"id":"pci.5","bus":"pcie.0","addr":"0x2.0x4"}' \
-device '{"driver":"pcie-root-port","port":21,"chassis":6,"id":"pci.6","bus":"pcie.0","addr":"0x2.0x5"}' \
-device '{"driver":"pcie-root-port","port":22,"chassis":7,"id":"pci.7","bus":"pcie.0","addr":"0x2.0x6"}' \
-blockdev '{"driver":"file","filename":"/home/images_nfv-virt-rt-kvm/rhel9.3.qcow2","aio":"threads","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device '{"driver":"virtio-blk-pci","iommu_platform":true,"ats":true,"bus":"pci.2","addr":"0x0","drive":"libvirt-1-format","id":"virtio-disk0","bootindex":1,"write-cache":"on"}' \
-netdev '{"type":"tap","fd":"23","vhost":true,"vhostfd":"25","id":"hostnet0"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"netdev":"hostnet0","id":"net0","mac":"88:66:da:5f:dd:11","bus":"pci.1","addr":"0x0"}' \
-chardev socket,id=charnet1,path=/tmp/vhost-user1,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet1","queues":2,"id":"hostnet1"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"mq":true,"vectors":6,"rx_queue_size":1024,"netdev":"hostnet1","id":"net1","mac":"88:66:da:5f:dd:12","bus":"pci.6","addr":"0x0"}' \
-chardev socket,id=charnet2,path=/tmp/vhost-user2,server=on \
-netdev '{"type":"vhost-user","chardev":"charnet2","queues":2,"id":"hostnet2"}' \
-device '{"driver":"virtio-net-pci","iommu_platform":true,"ats":true,"mq":true,"vectors":6,"rx_queue_size":1024,"netdev":"hostnet2","id":"net2","mac":"88:66:da:5f:dd:13","bus":"pci.7","addr":"0x0"}' \
-chardev pty,id=charserial0 \
-device '{"driver":"isa-serial","chardev":"charserial0","id":"serial0","index":0}' \
-audiodev '{"id":"audio1","driver":"none"}' \
-global ICH9-LPC.noreboot=off \
-watchdog-action reset \
-device '{"driver":"virtio-balloon-pci","iommu_platform":true,"ats":true,"id":"balloon0","bus":"pci.4","addr":"0x0"}' \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on \


> Could you also provide the result of numactl -H on the host?

# numactl -H
available: 2 nodes (0-1)
node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44 46 48 50 52 54 56 58 60 62
node 0 size: 31616 MB
node 0 free: 9008 MB
node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63
node 1 size: 32191 MB
node 1 free: 11144 MB
node distances:
node   0   1 
  0:  10  21 
  1:  21  10

Comment 10 Laurent Vivier 2023-08-31 08:32:52 UTC
(In reply to Yanghang Liu from comment #9)
...
> # numactl -H
> available: 2 nodes (0-1)
> node 0 cpus: 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 42 44
> 46 48 50 52 54 56 58 60 62
> node 0 size: 31616 MB
> node 0 free: 9008 MB
> node 1 cpus: 1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45
> 47 49 51 53 55 57 59 61 63
> node 1 size: 32191 MB
> node 1 free: 11144 MB
> node distances:
> node   0   1 
>   0:  10  21 
>   1:  21  10

I think you should at least put the emulator cpuset on the same node as the vCPU ones.

Things like:

<cputune>
    <vcpupin vcpu='0' cpuset='30'/>
    <vcpupin vcpu='1' cpuset='28'/>
    <vcpupin vcpu='2' cpuset='26'/>
    <vcpupin vcpu='3' cpuset='24'/>
    <vcpupin vcpu='4' cpuset='22'/>
    <vcpupin vcpu='5' cpuset='20'/>
    <emulatorpin cpuset='12,14,16,18'/>
</cputune>

Could you try?

Comment 12 Yanghang Liu 2023-09-05 09:03:22 UTC
Hi Laurent,

Thanks for the info :)  I have made a CPU tuning to all my nfv virt cases now and the test result looks good to me.

I can get an expected VM Throughput: 20.833937(Mpps)


Test env:
5.14.0-362.1.1.el9_3.x86_64
qemu-kvm-8.0.0-13.el9.x86_64
tuned-2.20.0-1.el9.noarch
libvirt-9.5.0-6.el9.x86_64
python3-libvirt-9.3.0-1.el9.x86_64
openvswitch3.1-3.1.0-52.el9fdp.x86_64
dpdk-22.11-4.el9.x86_64
edk2-ovmf-20230524-3.el9.noarch
seabios-bin-1.16.1-1.el9.noarch
The host CPU number: 64(0-63)
The host NUMA:                    
  NUMA node(s):          2
  NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62
  NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47,49,51,53,55,57,59,61,63
The host isolated CPU list: 2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63
The host core-numa CPU list used for dpdk-testpmd: 39,41,43,45,47,49,51,53,57,59,61,63
The host CPU list which are pinned to VM CPU : 22,32,30,28,26,24,34,36
The host CPU list which are used for emulartorpin:3,5,7,9
The VM CPU number: 8 (I increased the VCPU number of VM from 6 to 8,in case the VM VCPU are too busy)
The VM CPU number used for VM dpdk-testpmd: 5

Let us wait for my auto regression test result and if all passed, I will close this bug as NOTABUG.
(Currently I need to waiting for the verification of the testblocker Bug 2234390 and a round of my auto test will take around 15 hours)

Comment 13 Yanghang Liu 2023-09-14 06:41:31 UTC
Close this bug as NOTABUG as the performance is normal after tuning the CPU.

****************************************************
Packets_loss Frame_Size(Byte) Run_No Throughput(Mpps)
           0               64      0    20.892077
****************************************************

The regression test result : PASS

Related log:
http://10.73.72.41/log/2023-09-12_20:09/
http://10.73.72.41/log/2023-09-06_16:46/