Description of problem: Boot guest with vhost-user and vIOMMU, enable packed-ring of the virtio-net-pci of vhost-user. Boot testpmd in guest, qemu will crash. Version-Release number of selected component (if applicable): 4.18.0-167.el8.x86_64 qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot ovs with 2 dpdkvhostuserclient ports, refer to [1] # ovs-vsctl show 3d492a93-d752-4203-8b4f-04fa0a0a5127 Bridge "ovsbr1" Port "dpdk1" Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:5e:00.1", n_rxq="2"} Port "ovsbr1" Interface "ovsbr1" type: internal Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} Bridge "ovsbr0" Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:5e:00.0", n_rxq="2"} Port "vhost-user0" Interface "vhost-user0" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Port "ovsbr0" Interface "ovsbr0" type: internal 2.Boot qemu with vhost-user and vIOMMU, enable packed-ring of the virtio-net-pci of vhost-user, full cmd refer to[2] -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ 3. Start testpmd in guest # modprobe vfio # modprobe vfio-pci # dpdk-devbind --bind=vfio-pci 0000:06:00.0 # dpdk-devbind --bind=vfio-pci 0000:07:00.0 # /usr/bin/testpmd \ -l 1,2,3,4,5 \ -n 4 \ -d /usr/lib64/librte_pmd_virtio.so \ -w 0000:06:00.0 -w 0000:07:00.0 \ --iova-mode pa \ -- \ --nb-cores=4 \ -i \ --disable-rss \ --rxd=512 --txd=512 \ --rxq=2 --txq=2 3. qemu crash, ovs-vswitchd also crash. (qemu) qemu-kvm: Failed to read msg header. Read -1 instead of 12. Original request 22. qemu-kvm: Fail to update device iotlb qemu-kvm: Failed to read msg header. Read 0 instead of 12. Original request 8. qemu-kvm: Fail to update device iotlb qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 2 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 3 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to read from slave. qemu-kvm: Failed to read from slave. Segmentation fault (core dumped) # dmesg ... [12378.241785] vhost-events[5802]: segfault at 2 ip 0000558d52492169 sp 00007f4812ffb6d0 error 6 in ovs-vswitchd[558d521d7000+61f000] [12378.253525] Code: 0b 41 1c 66 89 02 ba 02 00 00 00 eb 87 0f 1f 40 00 83 c8 01 66 89 01 31 c0 48 83 c4 08 c3 0f 1f 00 48 8b 41 10 ba 01 00 00 00 <66> 89 50 02 31 c0 48 83 c4 08 c3 66 90 66 2e 0f 1f 84 00 00 00 00 [12378.624852] qemu-kvm[6052]: segfault at 198 ip 0000562ac564c6b0 sp 00007ffce7053f80 error 4 in qemu-kvm[562ac528d000+a46000] [12378.636067] Code: 75 10 48 8b 05 29 94 b2 00 89 c0 64 48 89 03 0f ae f0 90 48 8b 45 00 31 c9 85 d2 48 89 e7 0f 95 c1 41 b8 01 00 00 00 4c 89 e2 <48> 8b b0 98 01 00 00 e8 64 c7 f4 ff 48 83 3c 24 00 0f 84 f9 00 00 ... Actual results: qemu crash. Expected results: qemu should not crash. Additional info: 1. Without vIOMMU, qemu works well. Reference [1] # cat boot_ovs_client.sh #!/bin/bash set -e echo "killing old ovs process" pkill -f ovs-vswitchd || true sleep 5 pkill -f ovsdb-server || true echo "probing ovs kernel module" modprobe -r openvswitch || true modprobe openvswitch echo "clean env" DB_FILE=/etc/openvswitch/conf.db rm -rf /var/run/openvswitch mkdir /var/run/openvswitch rm -f $DB_FILE echo "init ovs db and boot db server" export DB_SOCK=/var/run/openvswitch/db.sock ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file ovs-vsctl --no-wait init echo "start ovs vswitch daemon" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1" ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log echo "creating bridge and ports" ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl --if-exists del-br ovsbr1 ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev ovs-vsctl add-port ovsbr1 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:5e:00.1 ovs-vsctl add-port ovsbr1 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock ovs-ofctl del-flows ovsbr1 ovs-ofctl add-flow ovsbr1 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr1 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x15554 ovs-vsctl set Interface dpdk0 options:n_rxq=2 ovs-vsctl set Interface dpdk1 options:n_rxq=2 echo "all done" [2] # cat qemu.sh /usr/libexec/qemu-kvm \ -name guest=rhel8.2 \ -machine q35,kernel_irqchip=split \ -cpu host \ -m 8192 \ -smp 6,sockets=6,cores=1,threads=1 \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/images_nfv-virt-rt-kvm/rhel8.2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=pci.3,addr=0x0 \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ -monitor stdio \ -vnc :2 \
Update OpenvSwith versions: openvswitch2.11-2.11.0-35.el8fdp.x86_64
If replace OpenvSwitch with dpdk's testpmd as vhost-user clients, qemu will work well. dpdk version: dpdk-19.11-1.el8.x86_64 Steps: 1. Replace Step 1 of Description with booting dpdk's testpmd, qemu will keep working well and testpmd can receive packets well. /usr/bin/testpmd \ -l 2,4,6,8,10,12,14,16,18 \ --socket-mem 1024,1024 \ -n 4 \ -d /usr/lib64/librte_pmd_vhost.so \ --vdev 'net_vhost0,iface=/tmp/vhostuser0.sock,queues=2,client=1,iommu-support=1' \ --vdev 'net_vhost1,iface=/tmp/vhostuser1.sock,queues=2,client=1,iommu-support=1' \ --iova-mode pa \ -- \ --portmask=f \ -i \ --rxd=512 --txd=512 \ --rxq=2 --txq=2 \ --nb-cores=8 \ --forward-mode=io testpmd> set portlist 0,2,1,3 testpmd> start testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 88822640 RX-missed: 0 RX-bytes: 5329364952 RX-errors: 0 RX-nombuf: 0 TX-packets: 75692878 TX-errors: 0 TX-bytes: 4541579232 Throughput (since last show) Rx-pps: 147041 Rx-bps: 70580080 Tx-pps: 125307 Tx-bps: 60147664 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 75693848 RX-missed: 0 RX-bytes: 4541637432 RX-errors: 0 RX-nombuf: 0 TX-packets: 88821732 TX-errors: 0 TX-bytes: 5329310472 Throughput (since last show) Rx-pps: 125307 Rx-bps: 60147664 Tx-pps: 147041 Tx-bps: 70580080 ############################################################################
There might be multiple issues going on here, so let's try to split them up: Looking at the code, qemu's implementation of vhost-user + multiqueue + iommu is likely to be utterly broken. It will create a slave channel per queue pair. When the second slave channel is created, the first one is closed by the vhost-user backend (which explains the "Failed to read from slave" errors). And when the first queue is started, SET_VRING_ADDR on queue pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. That is most likely the cause of qemu's segfault. I'll work upstream to fix that. If that's true: - It should be as reproducible with testpmd as it is with OvS. Pei, can you double check "queues=2" is present in qemu's command line for the testpmd case? - It should be as reproducible with or without "packed=on" Pei, can you please confirm this? Now, that does not explain OvS's crash. Can you please attach some logs to try to figure out what's going on there? Thanks
Hi Adrián, I've been trying to reproduce this issue many times (As I cannot reproduce with libvirt at first, but 100% reproduced). Finally I found the way to reproduce, no matter with qemu or libvirt. Here is the update: 1. vIOMMU + packed=on + memory host-nodes=1 are 3 key points to reproduce the issue 100%. Without one of them, this issue can not be reproduced. -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on,host-nodes=1,policy=bind \ -numa node,memdev=mem -mem-prealloc \ 2. When I reported this bz, I didn't explicitly set memory host-nodes=1, but in my setup it's default using memory host-nodes=1. (When we set memory host-nodes=0, it works very well) Sorry for late response. I just wanted to provide some certain testing results to avoid possible confusion and this took a bit long time.
(In reply to Pei Zhang from comment #8) > Hi Adrián, > > I've been trying to reproduce this issue many times (As I cannot reproduce > with libvirt at first, but 100% reproduced). fix typo: but 100% reproduced with qemu.
(In reply to Adrián Moreno from comment #7) > There might be multiple issues going on here, so let's try to split them up: > > Looking at the code, qemu's implementation of vhost-user + multiqueue + > iommu is likely to be utterly broken. It will create a slave channel per > queue pair. When the second slave channel is created, the first one is > closed by the vhost-user backend (which explains the "Failed to read from > slave" errors). And when the first queue is started, SET_VRING_ADDR on queue > pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. > That is most likely the cause of qemu's segfault. I'll work upstream to fix > that. > Hi Adrian, I've filed a bz to track the multiqueue issue with dpdk19.11. This one is not related to packed=on. I've already cc you. Bug 1793327 - "qemu-kvm: Failed to read from slave." shows when boot qemu vhost-user 2 queues over dpdk 19.11 Best regards, Pei
QEMU has been recently split into sub-components and as a one-time operation to avoid breakage of tools, we are setting the QEMU sub-component of this BZ to "General". Please review and change the sub-component if necessary the next time you review this BZ. Thanks
Update: With ovs patch fix of Bug 1812620, both ovs and qemu crash issues are gone. Both ovs and qemu keep working well and the throughput result looks good. Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu_packed Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 20.950139 20.950139 Versions: 4.18.0-187.el8.x86_64 qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64 tuned-2.13.0-5.el8.noarch python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 openvswitch2.13-2.13.0-6.el8fdp.x86_64 dpdk-19.11-4.el8.x86_64
More info: The ovs patch can fix both qemu and ovs crash issue. With latest fdp ovs2.11, ovs2.12, ovs2.13, both qemu and ovs work well. Below is more testing versions: ovs2.11: openvswitch2.11-2.11.0-35.el8fdp.x86_64, qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64: both qemu and ovs crash. openvswitch2.11-2.11.0-50.el8fdp.x86_64, qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64: both qemu and ovs work well. ovs2.12: openvswitch2.12-2.12.0-12.el8fdp.x86_64, qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64: both qemu and ovs crash. openvswitch2.12-2.12.0-23.el8fdp.x86_64, qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64: both qemu and ovs work well. ovs2.13: openvswitch2.13-2.13.0-6.el8fdp.x86_64, qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64: both qemu and ovs work well. Hi Cindy, From QE function and performance testing perspective, the qemu crash issue has gone with ovs fix. This bug can not be reproduced any more. However I'm not sure if there were some defects in qemu code and might be covered by the ovs fix.
After discussed with pei, We plan to move this to AV8.4
Hi pei, I have checked the log and the fix, I thinks the fix in dpdk have already fix this crash, We don't need fix it in qemu. Maybe we can close this bug?
(In reply to lulu from comment #23) > Hi pei, I have checked the log and the fix, I thinks the fix in dpdk have > already fix this crash, We don't need fix it in qemu. Maybe we can close > this bug? Hi Cindy, Thanks for the explain. I agree we can close it. Best regards, Pei