+++ This bug was initially created as a clone of Bug #1793064 +++ +++ This bug was initially created as a clone of Bug #1788415 +++ Description of problem: Boot guest with vhost-user and vIOMMU, enable packed-ring of the virtio-net-pci of vhost-user. Boot testpmd in guest, qemu will crash. Version-Release number of selected component (if applicable): 4.18.0-167.el8.x86_64 qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot ovs with 2 dpdkvhostuserclient ports, refer to [1] # ovs-vsctl show 3d492a93-d752-4203-8b4f-04fa0a0a5127 Bridge "ovsbr1" Port "dpdk1" Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:5e:00.1", n_rxq="2"} Port "ovsbr1" Interface "ovsbr1" type: internal Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} Bridge "ovsbr0" Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:5e:00.0", n_rxq="2"} Port "vhost-user0" Interface "vhost-user0" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Port "ovsbr0" Interface "ovsbr0" type: internal 2.Boot qemu with vhost-user and vIOMMU, enable packed-ring of the virtio-net-pci of vhost-user, full cmd refer to[2] -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ 3. Start testpmd in guest # modprobe vfio # modprobe vfio-pci # dpdk-devbind --bind=vfio-pci 0000:06:00.0 # dpdk-devbind --bind=vfio-pci 0000:07:00.0 # /usr/bin/testpmd \ -l 1,2,3,4,5 \ -n 4 \ -d /usr/lib64/librte_pmd_virtio.so \ -w 0000:06:00.0 -w 0000:07:00.0 \ --iova-mode pa \ -- \ --nb-cores=4 \ -i \ --disable-rss \ --rxd=512 --txd=512 \ --rxq=2 --txq=2 3. qemu crash, ovs-vswitchd also crash. (qemu) qemu-kvm: Failed to read msg header. Read -1 instead of 12. Original request 22. qemu-kvm: Fail to update device iotlb qemu-kvm: Failed to read msg header. Read 0 instead of 12. Original request 8. qemu-kvm: Fail to update device iotlb qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 2 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 3 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to read from slave. qemu-kvm: Failed to read from slave. Segmentation fault (core dumped) # dmesg ... [12378.241785] vhost-events[5802]: segfault at 2 ip 0000558d52492169 sp 00007f4812ffb6d0 error 6 in ovs-vswitchd[558d521d7000+61f000] [12378.253525] Code: 0b 41 1c 66 89 02 ba 02 00 00 00 eb 87 0f 1f 40 00 83 c8 01 66 89 01 31 c0 48 83 c4 08 c3 0f 1f 00 48 8b 41 10 ba 01 00 00 00 <66> 89 50 02 31 c0 48 83 c4 08 c3 66 90 66 2e 0f 1f 84 00 00 00 00 [12378.624852] qemu-kvm[6052]: segfault at 198 ip 0000562ac564c6b0 sp 00007ffce7053f80 error 4 in qemu-kvm[562ac528d000+a46000] [12378.636067] Code: 75 10 48 8b 05 29 94 b2 00 89 c0 64 48 89 03 0f ae f0 90 48 8b 45 00 31 c9 85 d2 48 89 e7 0f 95 c1 41 b8 01 00 00 00 4c 89 e2 <48> 8b b0 98 01 00 00 e8 64 c7 f4 ff 48 83 3c 24 00 0f 84 f9 00 00 ... Actual results: qemu crash. Expected results: qemu should not crash. Additional info: 1. Without vIOMMU, qemu works well. Reference [1] # cat boot_ovs_client.sh #!/bin/bash set -e echo "killing old ovs process" pkill -f ovs-vswitchd || true sleep 5 pkill -f ovsdb-server || true echo "probing ovs kernel module" modprobe -r openvswitch || true modprobe openvswitch echo "clean env" DB_FILE=/etc/openvswitch/conf.db rm -rf /var/run/openvswitch mkdir /var/run/openvswitch rm -f $DB_FILE echo "init ovs db and boot db server" export DB_SOCK=/var/run/openvswitch/db.sock ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file ovs-vsctl --no-wait init echo "start ovs vswitch daemon" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1" ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log echo "creating bridge and ports" ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl --if-exists del-br ovsbr1 ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev ovs-vsctl add-port ovsbr1 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:5e:00.1 ovs-vsctl add-port ovsbr1 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock ovs-ofctl del-flows ovsbr1 ovs-ofctl add-flow ovsbr1 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr1 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x15554 ovs-vsctl set Interface dpdk0 options:n_rxq=2 ovs-vsctl set Interface dpdk1 options:n_rxq=2 echo "all done" [2] # cat qemu.sh /usr/libexec/qemu-kvm \ -name guest=rhel8.2 \ -machine q35,kernel_irqchip=split \ -cpu host \ -m 8192 \ -smp 6,sockets=6,cores=1,threads=1 \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/images_nfv-virt-rt-kvm/rhel8.2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=pci.3,addr=0x0 \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ -monitor stdio \ -vnc :2 \ --- Additional comment from Pei Zhang on 2020-01-07 15:12:46 HKT --- Update OpenvSwith versions: openvswitch2.11-2.11.0-35.el8fdp.x86_64 --- Additional comment from Pei Zhang on 2020-01-07 15:19:23 HKT --- If replace OpenvSwitch with dpdk's testpmd as vhost-user clients, qemu will work well. dpdk version: dpdk-19.11-1.el8.x86_64 Steps: 1. Replace Step 1 of Description with booting dpdk's testpmd, qemu will keep working well and testpmd can receive packets well. /usr/bin/testpmd \ -l 2,4,6,8,10,12,14,16,18 \ --socket-mem 1024,1024 \ -n 4 \ -d /usr/lib64/librte_pmd_vhost.so \ --vdev 'net_vhost0,iface=/tmp/vhostuser0.sock,queues=2,client=1,iommu-support=1' \ --vdev 'net_vhost1,iface=/tmp/vhostuser1.sock,queues=2,client=1,iommu-support=1' \ --iova-mode pa \ -- \ --portmask=f \ -i \ --rxd=512 --txd=512 \ --rxq=2 --txq=2 \ --nb-cores=8 \ --forward-mode=io testpmd> set portlist 0,2,1,3 testpmd> start testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 88822640 RX-missed: 0 RX-bytes: 5329364952 RX-errors: 0 RX-nombuf: 0 TX-packets: 75692878 TX-errors: 0 TX-bytes: 4541579232 Throughput (since last show) Rx-pps: 147041 Rx-bps: 70580080 Tx-pps: 125307 Tx-bps: 60147664 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 75693848 RX-missed: 0 RX-bytes: 4541637432 RX-errors: 0 RX-nombuf: 0 TX-packets: 88821732 TX-errors: 0 TX-bytes: 5329310472 Throughput (since last show) Rx-pps: 125307 Rx-bps: 60147664 Tx-pps: 147041 Tx-bps: 70580080 ############################################################################ --- Additional comment from Rick Barry on 2020-01-08 00:05:52 HKT --- Assigning to Amnon's team for review. CC'ing Ariel. Priority is High, so I set ITR to 8.2.0. Feel free to change that, if you feel that's incorrect. --- Additional comment from Ariel Adam on 2020-01-08 00:36:30 HKT --- Adrian, can you take a look? May be an FD bug and not an AV bug. If so please change the component from qemu-kvm to FDP. --- Additional comment from Adrián Moreno on 2020-01-13 20:07:19 HKT --- Hi Pei, I haven't reproduced it yet but first, let me ask: > --iova-mode pa \ Any reason why PA is used? Is it also reproducible without that option (or with iova-mode=va)? Also, if it's not much to ask. Can you try to reproduce it with dpdk 18.11.5 stable release? Thanks, Adrian --- Additional comment from Adrián Moreno on 2020-01-16 18:35:10 HKT --- Also, Pei, according to the few logs that qemu writes, it seems related with the iotlb updates messages in the vhost-user backend. My guess is that it's really not related with "packed" queue. Can you please confirm this? Thanks --- Additional comment from Adrián Moreno on 2020-01-16 19:40:50 HKT --- There might be multiple issues going on here, so let's try to split them up: Looking at the code, qemu's implementation of vhost-user + multiqueue + iommu is likely to be utterly broken. It will create a slave channel per queue pair. When the second slave channel is created, the first one is closed by the vhost-user backend (which explains the "Failed to read from slave" errors). And when the first queue is started, SET_VRING_ADDR on queue pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. That is most likely the cause of qemu's segfault. I'll work upstream to fix that. If that's true: - It should be as reproducible with testpmd as it is with OvS. Pei, can you double check "queues=2" is present in qemu's command line for the testpmd case? - It should be as reproducible with or without "packed=on" Pei, can you please confirm this? Now, that does not explain OvS's crash. Can you please attach some logs to try to figure out what's going on there? Thanks --- Additional comment from Pei Zhang on 2020-01-17 17:39:57 HKT --- Hi Adrián, I've been trying to reproduce this issue many times (As I cannot reproduce with libvirt at first, but 100% reproduced). Finally I found the way to reproduce, no matter with qemu or libvirt. Here is the update: 1. vIOMMU + packed=on + memory host-nodes=1 are 3 key points to reproduce the issue 100%. Without one of them, this issue can not be reproduced. -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on,host-nodes=1,policy=bind \ -numa node,memdev=mem -mem-prealloc \ 2. When I reported this bz, I didn't explicitly set memory host-nodes=1, but in my setup it's default using memory host-nodes=1. (When we set memory host-nodes=0, it works very well) Sorry for late response. I just wanted to provide some certain testing results to avoid possible confusion and this took a bit long time. --- Additional comment from Pei Zhang on 2020-01-17 17:41:01 HKT --- (In reply to Pei Zhang from comment #8) > Hi Adrián, > > I've been trying to reproduce this issue many times (As I cannot reproduce > with libvirt at first, but 100% reproduced). fix typo: but 100% reproduced with qemu. --- Additional comment from Pei Zhang on 2020-01-17 17:50:28 HKT --- (In reply to Adrián Moreno from comment #5) > Hi Pei, > > I haven't reproduced it yet but first, let me ask: > > --iova-mode pa \ > > Any reason why PA is used? Is it also reproducible without that option (or > with iova-mode=va)? Hi Adrián, I tested with iova-mode=va as it's a valid usage according https://bugzilla.redhat.com/show_bug.cgi?id=1738751. Without iova-mode=va, this issue can also be reproduced. > > Also, if it's not much to ask. Can you try to reproduce it with dpdk 18.11.5 > stable release? Yes, it can also be reproduced with dpdk 18.11.5. (In reply to Adrián Moreno from comment #6) > Also, Pei, according to the few logs that qemu writes, it seems related with > the iotlb updates messages in the vhost-user backend. > My guess is that it's really not related with "packed" queue. > Can you please confirm this? Without packed=on, this issue can not be reproduced any more. --- Additional comment from Pei Zhang on 2020-01-17 18:11:29 HKT --- (In reply to Adrián Moreno from comment #7) > There might be multiple issues going on here, so let's try to split them up: > > Looking at the code, qemu's implementation of vhost-user + multiqueue + > iommu is likely to be utterly broken. It will create a slave channel per > queue pair. When the second slave channel is created, the first one is > closed by the vhost-user backend (which explains the "Failed to read from > slave" errors). And when the first queue is started, SET_VRING_ADDR on queue > pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. > That is most likely the cause of qemu's segfault. I'll work upstream to fix > that. > > If that's true: > - It should be as reproducible with testpmd as it is with OvS. Pei, can you > double check "queues=2" is present in qemu's command line for the testpmd > case? With testpmd(replace ovs) on host, this issue can not be reproducible. I can confirm this. It only can be reproduced over OvS. Yes, queues=2 is present in qemu's command line. qeueues=2 is also present in OvS and testpmd in guest. Besides, this issue can also reproducible with single queue. > - It should be as reproducible with or without "packed=on" Without "packed=on", this issue can not be reproduced. > > Pei, can you please confirm this? > > Now, that does not explain OvS's crash. Can you please attach some logs to > try to figure out what's going on there? Please check this link for the ovs log, libvirt log, and full xml. http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/pezhang/bug1788415/ Thanks. Best regards, Pei --- Additional comment from Adrián Moreno on 2020-01-17 18:15:45 HKT --- (In reply to Pei Zhang from comment #10) > (In reply to Adrián Moreno from comment #6) > > Also, Pei, according to the few logs that qemu writes, it seems related with > > the iotlb updates messages in the vhost-user backend. > > My guess is that it's really not related with "packed" queue. > > Can you please confirm this? > > Without packed=on, this issue can not be reproduced any more. OK, so there are several issues here. I get a qemu crash 100% with iommu=on and multiqueue (no matter the value of "packed" or "host-nodes"). In fact, I can't even run testpmd in the guest, it crashes at boot time. I have a fix for that and it's working in my setup (still not posted upstream), so I can proceed with the next issue. --- Additional comment from Adrián Moreno on 2020-01-17 18:17:52 HKT --- (In reply to Pei Zhang from comment #11) > (In reply to Adrián Moreno from comment #7) > > There might be multiple issues going on here, so let's try to split them up: > > > > Looking at the code, qemu's implementation of vhost-user + multiqueue + > > iommu is likely to be utterly broken. It will create a slave channel per > > queue pair. When the second slave channel is created, the first one is > > closed by the vhost-user backend (which explains the "Failed to read from > > slave" errors). And when the first queue is started, SET_VRING_ADDR on queue > > pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. > > That is most likely the cause of qemu's segfault. I'll work upstream to fix > > that. > > > > If that's true: > > - It should be as reproducible with testpmd as it is with OvS. Pei, can you > > double check "queues=2" is present in qemu's command line for the testpmd > > case? > > With testpmd(replace ovs) on host, this issue can not be reproducible. I can > confirm this. It only can be reproduced over OvS. > > Yes, queues=2 is present in qemu's command line. qeueues=2 is also present > in OvS and testpmd in guest. > > Besides, this issue can also reproducible with single queue. > > > > - It should be as reproducible with or without "packed=on" > > Without "packed=on", this issue can not be reproduced. > > > > > Pei, can you please confirm this? > > > > Now, that does not explain OvS's crash. Can you please attach some logs to > > try to figure out what's going on there? > > Please check this link for the ovs log, libvirt log, and full xml. > Thank you. I don't have a way to reproduce this issue since I don't have a NUMA system. > http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/ > pezhang/bug1788415/ > I don't have permission to download the files. Yo may have to chmod them. Thanks --- Additional comment from Pei Zhang on 2020-01-17 18:44:34 HKT --- Hi Adrian, Do we need bug to track OvS crash issue? ovs2.11 and ovs2.12 both hit this problem. Please let me know if need and I can file one. Thanks. Best regards, Pei --- Additional comment from Adrián Moreno on 2020-01-20 18:33:24 HKT --- (In reply to Pei Zhang from comment #14) > Hi Adrian, > > Do we need bug to track OvS crash issue? ovs2.11 and ovs2.12 both hit this > problem. > Yes. Let's split this up. Thanks > Please let me know if need and I can file one. Thanks. > > Best regards, > > Pei --- Additional comment from Pei Zhang on 2020-01-20 23:38:39 HKT --- Versions: openvswitch2.12-2.12.0-12.el8fdp.x86_64 We filed this bz to track ovs2.12 crash issue. --- Additional comment from Adrián Moreno on 2020-01-25 17:16:41 HKT --- Posted a fix to upstream DPDK: http://patches.dpdk.org/patch/65122/
Verified with openvswitch2.12-2.12.0-23.el8fdp.x86_64 and openvswitch2.11-2.11.0-50.el8fdp.x86_64: Following steps in Description. Both ovs and qemu keep working well. And the throughput performance looks good. Other Versions: 4.18.0-187.el8.x86_64 qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64 tuned-2.13.0-5.el8.noarch python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 dpdk-19.11-4.el8.x86_64 With openvswitch2.12-2.12.0-23.el8fdp.x86_64: Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu_packed Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.614381 21.614381 With openvswitch2.11-2.11.0-50.el8fdp.x86_64: Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu_packed Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.307350 21.30735 So this bug has been fixed very well. Move to 'VERIFIED'
This bug did not meet the criteria for automatic migration and is being closed. If the issue remains, please open a new ticket in https://issues.redhat.com/browse/FDP