Bug 1949845

Summary: Run testpmd after enable hyperthreading and do qemu thread priority guest got hung
Product: Red Hat Enterprise Linux 8 Reporter: mhou <mhou>
Component: kernel-rtAssignee: Virtualization Maintenance <virt-maint>
kernel-rt sub component: KVM QA Contact: Pei Zhang <pezhang>
Status: CLOSED NOTABUG Docs Contact:
Severity: low    
Priority: low CC: bhu, chayang, ctrautma, jinzhao, jlelli, juri.lelli, juzhang, kzhang, lcapitulino, mhou, mtosatti, virt-maint
Version: 8.2Keywords: Triaged
Target Milestone: beta   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-04-27 09:49:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1932086    

Description mhou 2021-04-15 08:52:05 UTC
Description of problem:
Guest will be got hung status when running testpmd. After the guest started, set the vcpu isolation and emulator isolation. Then set vcpu threads and emualtor priority. 

Version-Release number of selected component (if applicable):
kernel version:4.18.0-193.51.1.rt13.101.el8_2.x86_64
KVM version: virt:8.2/common
http://download.eng.pek2.redhat.com/rhel-8/rel-eng/ADVANCED-VIRT-8/latest-ADVANCED-VIRT-8.2.1-RHEL-8/compose/Advanced-virt/x86_64/os/
dpdk version:dpdk-19.11-3.el8.x86_64
ovs version:openvswitch2.15-2.15.0-1.el8fdp.x86_64

How reproducible:
4/4

Steps to Reproduce:
1. Create ovs database and start vswitch.
sudo /usr/bin/ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema
sudo /usr/sbin/ovsdb-server --remote=punix:/var/run/openvswitch/db.sock --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile=/var/run/openvswitch/ovsdb-server.pid --overwrite-pidfile
sudo /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
sudo /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=0x1
sudo /usr/bin/ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem=1024,1024
/bin/bash -c "sudo -E /usr/sbin/ovs-vswitchd --pidfile=/var/run/openvswitch/ovs-vswitchd.pid --overwrite-pidfile --log-file=/tmp/vswitchd.log"

2. Create bridge and configure flows.
sudo /usr/bin/ovs-vsctl --timeout 10 add-br br0 -- set bridge br0 datapath_type=netdev
sudo /usr/bin/ovs-vsctl --timeout 10 set Open_vSwitch . other_config:max-idle=30000
sudo /usr/bin/ovs-vsctl --timeout 10 set Open_vSwitch . other_config:pmd-cpu-mask=0xf000000000f0000
sudo /usr/bin/ovs-vsctl --timeout 10 add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:12:00.0 options:n_rxq=4
sudo /usr/bin/ovs-vsctl --timeout 10 add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:12:00.1 options:n_rxq=4
sudo /usr/bin/ovs-vsctl --timeout 10 add-port br0 dpdkvhostuserclient0 -- set Interface dpdkvhostuserclient0 type=dpdkvhostuserclient -- set Interface dpdkvhostuserclient0 options:vhost-server-path=/var/run/openvswitch/dpdkvhostuserclient0
sudo /usr/bin/ovs-vsctl --timeout 10 add-port br0 dpdkvhostuserclient1 -- set Interface dpdkvhostuserclient1 type=dpdkvhostuserclient -- set Interface dpdkvhostuserclient1 options:vhost-server-path=/var/run/openvswitch/dpdkvhostuserclient1
sudo /usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 del-flows br0
sudo /usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 add-flow br0 idle_timeout=0,in_port=1,action=output:3
sudo /usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 add-flow br0 idle_timeout=0,in_port=3,action=output:1
sudo /usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 add-flow br0 idle_timeout=0,in_port=4,action=output:2
sudo /usr/bin/ovs-ofctl -O OpenFlow13 --timeout 10 add-flow br0 idle_timeout=0,in_port=2,action=output:4

3. Start a guest
cd /root/vswitchperf
/bin/bash -c "sudo -E taskset -c 23,24,25,26,27,28,29,30,31 /usr/libexec/qemu-kvm -name test,debug-threads=on -m 8192 -machine q35,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=split -overcommit mem-lock=on -smp 9,sockets=9,cores=1,threads=1 -cpu host,migratable=on,tsc-deadline=on,pmu=off -drive if=ide,file=rhel8.3-vsperf-4Q-noviommu.qcow2 -boot c --enable-kvm -monitor unix:/tmp/vm0monitor,server,nowait -object memory-backend-file,id=mem,size=8192M,mem-path=/dev/hugepages,share=on,prealloc=yes,host-nodes=0,policy=bind -numa node,cpus=0-8,nodeid=0,memdev=mem -nographic -vnc :0 -name Client0 -global kvm-pit.lost_tick_policy=delay -no-hpet -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on -snapshot -net none -no-reboot -chardev socket,id=char0,path=/var/run/openvswitch/dpdkvhostuserclient0,server -netdev type=vhost-user,id=net1,chardev=char0,vhostforce,queues=4 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=net1,csum=off,mrg_rxbuf=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,rx_queue_size=1024,mq=on,vectors=10 -chardev socket,id=char1,path=/var/run/openvswitch/dpdkvhostuserclient1,server -netdev type=vhost-user,id=net2,chardev=char1,vhostforce,queues=4 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=net2,csum=off,mrg_rxbuf=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,rx_queue_size=1024,mq=on,vectors=10"

4. Set cpu isolation
# sudo socat - UNIX-CONNECT:/tmp/vm0monitor
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) info cpus
info cpus
* CPU #0: thread_id=78921
  CPU #1: thread_id=78922
  CPU #2: thread_id=78923
  CPU #3: thread_id=78924
  CPU #4: thread_id=78925
  CPU #5: thread_id=78926
  CPU #6: thread_id=78927
  CPU #7: thread_id=78928
  CPU #8: thread_id=78929
(qemu) 
pid 78921's current affinity list: 23-31
pid 78921's new affinity list: 1
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 2 78922
pid 78922's current affinity list: 23-31
pid 78922's new affinity list: 2
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 3 78923
pid 78923's current affinity list: 23-31
pid 78923's new affinity list: 3
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 4 78924
pid 78924's current affinity list: 23-31
pid 78924's new affinity list: 4
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 5 78925
pid 78925's current affinity list: 23-31
pid 78925's new affinity list: 5
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 6 78926
pid 78926's current affinity list: 23-31
pid 78926's new affinity list: 6
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 7 78927
pid 78927's current affinity list: 23-31
pid 78927's new affinity list: 7
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 8 78928
pid 78928's current affinity list: 23-31
pid 78928's new affinity list: 8
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 9 78929
pid 78929's current affinity list: 23-31
pid 78929's new affinity list: 9

5. Set emulator and vcpu priority. 118249 is the main process of qemu.
sudo chrt -f -p -a 1 118249

6.
After running this command and wait for a moment, VM will get hung.

Actual results:
1. guest will finish vcpu isolation and vcpu & emulator priority tuning.


Expected results:
1. Guest got hung

Additional info:
1. The vcpu thread still occupies the running time in the cpu, but the sum-exec of  emulator threads have not increased any more.
# cat /proc/sched_debug | grep "cpu#\|qemu"
cpu#0, 2100.000 MHz
cpu#1, 2100.000 MHz
 D       qemu-kvm 78921         0.000000   1770119    98         0.000000     60871.913797         0.000000 /
cpu#2, 2100.000 MHz
 D       qemu-kvm 78922         0.000000   1678666    98         0.000000     26070.113814         0.000000 /
cpu#3, 2100.000 MHz
 D       qemu-kvm 78923        -2.903212   1678527    98         0.000000     26234.071899         0.000000 /
cpu#4, 2100.000 MHz
 D       qemu-kvm 78924         0.000000   1678534    98         0.000000     25999.436236         0.000000 /
cpu#5, 2100.000 MHz
 D       qemu-kvm 78925         0.000000   1678714    98         0.000000     26909.799548         0.000000 /
cpu#6, 2100.000 MHz
>R       qemu-kvm 78926         0.000000   1683176    98         0.000000     26831.691186         0.000000 /
cpu#7, 2100.000 MHz
 D       qemu-kvm 78927         0.000000   1678561    98         0.000000     25420.400857         0.000000 /
cpu#8, 2100.000 MHz
 D       qemu-kvm 78928         0.000000   1678507    98         0.000000     26131.429898         0.000000 /
cpu#9, 2100.000 MHz
 D       qemu-kvm 78929        -9.001038   1678555    98         0.000000     26061.694282         0.000000 /
cpu#10, 2100.000 MHz
cpu#11, 2100.000 MHz
cpu#12, 2100.000 MHz
cpu#13, 2100.000 MHz
cpu#14, 2100.000 MHz
cpu#15, 2100.000 MHz
cpu#16, 2100.000 MHz
cpu#17, 2100.000 MHz
cpu#18, 2100.000 MHz
cpu#19, 2100.000 MHz
cpu#20, 2100.000 MHz
cpu#21, 2100.000 MHz
cpu#22, 2100.000 MHz
cpu#23, 2100.000 MHz
 S       qemu-kvm 78905         0.000000     12298    98         0.000000       807.776228         0.000000 /
cpu#24, 2100.000 MHz
cpu#25, 2100.000 MHz
 S       qemu-kvm 78906       -12.000000       113    98         0.000000        10.778182         0.000000 /
cpu#26, 2100.000 MHz
cpu#27, 2100.000 MHz
cpu#28, 2100.000 MHz
cpu#29, 2100.000 MHz
 S       qemu-kvm 78931       -12.000000        29    98         0.000000         2.637698         0.000000 /
cpu#30, 2100.000 MHz
cpu#31, 2100.000 MHz
cpu#32, 2100.000 MHz
cpu#33, 2100.000 MHz
cpu#34, 2100.000 MHz
cpu#35, 2100.000 MHz
cpu#36, 2100.000 MHz
cpu#37, 2100.000 MHz
cpu#38, 2100.000 MHz
cpu#39, 2100.000 MHz

2. This issue can also reproduce in rhel8.3 and virt:8.3/common

3. If I disable hyper thread, don't set emulator fifo priority, only set vcpu isolation and priority. Guest can work as well.
....(same above 1-3 steps)
 # sudo socat - UNIX-CONNECT:/tmp/vm0monitor
QEMU 4.2.0 monitor - type 'help' for more information
(qemu) info cpus
info cpus
* CPU #0: thread_id=79153
  CPU #1: thread_id=79154
  CPU #2: thread_id=79155
  CPU #3: thread_id=79156
  CPU #4: thread_id=79157
  CPU #5: thread_id=79158
  CPU #6: thread_id=79159
  CPU #7: thread_id=79160
  CPU #8: thread_id=79161
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 1 79153
pid 79153's current affinity list: 23-31
pid 79153's new affinity list: 1
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79153
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 2 79154
pid 79154's current affinity list: 23-31
pid 79154's new affinity list: 2
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79154
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 3 79155
pid 79155's current affinity list: 23-31
pid 79155's new affinity list: 3
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79155
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 4 79156
pid 79156's current affinity list: 23-31
pid 79156's new affinity list: 4
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79156
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 5 79157
pid 79157's current affinity list: 23-31
pid 79157's new affinity list: 5
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79157
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 6 79158
pid 79158's current affinity list: 23-31
pid 79158's new affinity list: 6
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79158
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 7 79159
pid 79159's current affinity list: 23-31
pid 79159's new affinity list: 7
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79159
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 8 79160
pid 79160's current affinity list: 23-31
pid 79160's new affinity list: 8
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79160
[root@hp-dl388g10-03 ~]# sudo taskset -c -p 9 79161
pid 79161's current affinity list: 23-31
pid 79161's new affinity list: 9
[root@hp-dl388g10-03 ~]# chrt -f -p 1 79161

# cat /proc/sched_debug | grep "cpu#\|qemu"
cpu#0, 2100.000 MHz
cpu#1, 2100.000 MHz
 S       qemu-kvm 79153         0.000000    383251    98         0.000000     42307.043412         0.000000 /
cpu#2, 2100.000 MHz
 S       qemu-kvm 79154        -7.872565      8104    98         0.000000       263.567837         0.000000 /
cpu#3, 2100.000 MHz
 S       qemu-kvm 79155       -11.277047      8084    98         0.000000       208.464967         0.000000 /
cpu#4, 2100.000 MHz
 S       qemu-kvm 79156         0.000000      8043    98         0.000000       192.646090         0.000000 /
cpu#5, 2100.000 MHz
 S       qemu-kvm 79157       -10.968355      8173    98         0.000000       349.761972         0.000000 /
cpu#6, 2100.000 MHz
 S       qemu-kvm 79158       -11.657248     12674    98         0.000000       988.799906         0.000000 /
cpu#7, 2100.000 MHz
 S       qemu-kvm 79159       -10.334056      8069    98         0.000000       186.500067         0.000000 /
cpu#8, 2100.000 MHz
 S       qemu-kvm 79160       -11.596942      8087    98         0.000000       181.955443         0.000000 /
cpu#9, 2100.000 MHz
 S       qemu-kvm 79161       -10.459634      8067    98         0.000000       177.825558         0.000000 /
cpu#10, 2100.000 MHz
cpu#11, 2100.000 MHz
cpu#12, 2100.000 MHz
cpu#13, 2100.000 MHz
cpu#14, 2100.000 MHz
cpu#15, 2100.000 MHz
cpu#16, 2100.000 MHz
cpu#17, 2100.000 MHz
cpu#18, 2100.000 MHz
cpu#19, 2100.000 MHz
cpu#20, 2100.000 MHz
cpu#21, 2100.000 MHz
cpu#22, 2100.000 MHz
cpu#23, 2100.000 MHz
cpu#24, 2100.000 MHz
 S       qemu-kvm 79138   1075601.924998       112   120         0.000000        10.678413         0.000000 /
cpu#25, 2100.000 MHz
 S       qemu-kvm 79137    767132.019673     11598   120         0.000000       731.591252         0.000000 /
cpu#26, 2100.000 MHz
cpu#27, 2100.000 MHz
cpu#28, 2100.000 MHz
 S       qemu-kvm 79163    302370.445237         2   120         0.000000         0.096315         0.000000 /
cpu#29, 2100.000 MHz
cpu#30, 2100.000 MHz
cpu#31, 2100.000 MHz
cpu#32, 2100.000 MHz
cpu#33, 2100.000 MHz
cpu#34, 2100.000 MHz
cpu#35, 2100.000 MHz
cpu#36, 2100.000 MHz
cpu#37, 2100.000 MHz
cpu#38, 2100.000 MHz
cpu#39, 2100.000 MHz

4.If I enable hyper thread, don't set emulator fifo priority, only set vcpu isolation and priority. The guest also got hung. (Juri already on test server and HT already set. Wait he release server, I can provide test info. )

Comment 1 Pei Zhang 2021-04-15 10:54:08 UTC
Hello Minxi, 

Does this testing scenario works with libvirt and disabling HT? As customers use kvm-rt from libvirt layer. And I think we suggest users to disable HT for the KVM-RT for the performance guarantee.Thanks.

Best regards,

Pei

Comment 2 mhou 2021-04-15 11:20:11 UTC
Hello Pei

I thought when using libvirt to start a guest, it will work as well. It just occurs when using qemu. From the perspective of qemu, this situation will be encountered regardless of whether HT is turned off.

Comment 3 Luiz Capitulino 2021-04-15 20:56:29 UTC
(In reply to mhou from comment #2)
> Hello Pei
> 
> I thought when using libvirt to start a guest, it will work as well. It just
> occurs when using qemu. From the perspective of qemu, this situation will be
> encountered regardless of whether HT is turned off.

Do you mean that it works with libvirt? We only support running qemu via libvirt.

Comment 4 mhou 2021-04-16 09:50:22 UTC
Actually, I still can't find an easy way to convert test qemu cmd[1] to XML file. As per talk with Pei, Pei already doing enough tuning test on libvirt. So I thought when using libvirt to start a guest, it will work as well. 

test qemu cmd[1]
/bin/bash -c "sudo -E taskset -c 23,24,25,26,27,28,29,30,31 /usr/libexec/qemu-kvm -name test,debug-threads=on -m 8192 -machine q35,accel=kvm,usb=off,vmport=off,dump-guest-core=off,kernel_irqchip=split -overcommit mem-lock=on -smp 9,sockets=9,cores=1,threads=1 -cpu host,migratable=on,tsc-deadline=on,pmu=off -drive if=ide,file=rhel8.3-vsperf-4Q-noviommu.qcow2 -boot c --enable-kvm -monitor unix:/tmp/vm0monitor,server,nowait -object memory-backend-file,id=mem,size=8192M,mem-path=/dev/hugepages,share=on,prealloc=yes,host-nodes=0,policy=bind -numa node,cpus=0-8,nodeid=0,memdev=mem -nographic -vnc :0 -name Client0 -global kvm-pit.lost_tick_policy=delay -no-hpet -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on -snapshot -net none -no-reboot -chardev socket,id=char0,path=/var/run/openvswitch/dpdkvhostuserclient0,server -netdev type=vhost-user,id=net1,chardev=char0,vhostforce,queues=4 -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=net1,csum=off,mrg_rxbuf=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,rx_queue_size=1024,mq=on,vectors=10 -chardev socket,id=char1,path=/var/run/openvswitch/dpdkvhostuserclient1,server -netdev type=vhost-user,id=net2,chardev=char1,vhostforce,queues=4 -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=net2,csum=off,mrg_rxbuf=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off,rx_queue_size=1024,mq=on,vectors=10"

Comment 5 Luiz Capitulino 2021-04-16 16:02:40 UTC
(In reply to mhou from comment #4)
> Actually, I still can't find an easy way to convert test qemu cmd[1] to XML
> file. As per talk with Pei, Pei already doing enough tuning test on libvirt.
> So I thought when using libvirt to start a guest, it will work as well. 

Hi Minxi,

Thanks a lot for this testing and for filing the BZ! It's important that we
know about possible bugs, especially crashes and hung tasks.

As it turns out, we only support running QEMU through libvirt. If you think
this will work via libvirt, would it make sense to close the BZ? Or, we could
keep it open but we'd need to convert your command-line to a working XML file
and try to reproduce with libvirt and a proper KVM-RT configuration.

PS: I'm changing priority to low since we have the expectation this will
work under libvirt.