+++ This bug was initially created as a clone of Bug #1793068 +++ +++ This bug was initially created as a clone of Bug #1788415 +++ Description of problem: Boot guest with vhost-user and vIOMMU, enable packed-ring of the virtio-net-pci of vhost-user. Boot testpmd in guest, qemu will crash. Version-Release number of selected component (if applicable): 4.18.0-167.el8.x86_64 qemu-kvm-4.2.0-4.module+el8.2.0+5220+e82621dc.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot ovs with 2 dpdkvhostuserclient ports, refer to [1] # ovs-vsctl show 3d492a93-d752-4203-8b4f-04fa0a0a5127 Bridge "ovsbr1" Port "dpdk1" Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:5e:00.1", n_rxq="2"} Port "ovsbr1" Interface "ovsbr1" type: internal Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} Bridge "ovsbr0" Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:5e:00.0", n_rxq="2"} Port "vhost-user0" Interface "vhost-user0" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Port "ovsbr0" Interface "ovsbr0" type: internal 2.Boot qemu with vhost-user and vIOMMU, enable packed-ring of the virtio-net-pci of vhost-user, full cmd refer to[2] -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ 3. Start testpmd in guest # modprobe vfio # modprobe vfio-pci # dpdk-devbind --bind=vfio-pci 0000:06:00.0 # dpdk-devbind --bind=vfio-pci 0000:07:00.0 # /usr/bin/testpmd \ -l 1,2,3,4,5 \ -n 4 \ -d /usr/lib64/librte_pmd_virtio.so \ -w 0000:06:00.0 -w 0000:07:00.0 \ --iova-mode pa \ -- \ --nb-cores=4 \ -i \ --disable-rss \ --rxd=512 --txd=512 \ --rxq=2 --txq=2 3. qemu crash, ovs-vswitchd also crash. (qemu) qemu-kvm: Failed to read msg header. Read -1 instead of 12. Original request 22. qemu-kvm: Fail to update device iotlb qemu-kvm: Failed to read msg header. Read 0 instead of 12. Original request 8. qemu-kvm: Fail to update device iotlb qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 0 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 1 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 2 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to set msg fds. qemu-kvm: vhost VQ 3 ring restore failed: -1: Resource temporarily unavailable (11) qemu-kvm: Failed to read from slave. qemu-kvm: Failed to read from slave. Segmentation fault (core dumped) # dmesg ... [12378.241785] vhost-events[5802]: segfault at 2 ip 0000558d52492169 sp 00007f4812ffb6d0 error 6 in ovs-vswitchd[558d521d7000+61f000] [12378.253525] Code: 0b 41 1c 66 89 02 ba 02 00 00 00 eb 87 0f 1f 40 00 83 c8 01 66 89 01 31 c0 48 83 c4 08 c3 0f 1f 00 48 8b 41 10 ba 01 00 00 00 <66> 89 50 02 31 c0 48 83 c4 08 c3 66 90 66 2e 0f 1f 84 00 00 00 00 [12378.624852] qemu-kvm[6052]: segfault at 198 ip 0000562ac564c6b0 sp 00007ffce7053f80 error 4 in qemu-kvm[562ac528d000+a46000] [12378.636067] Code: 75 10 48 8b 05 29 94 b2 00 89 c0 64 48 89 03 0f ae f0 90 48 8b 45 00 31 c9 85 d2 48 89 e7 0f 95 c1 41 b8 01 00 00 00 4c 89 e2 <48> 8b b0 98 01 00 00 e8 64 c7 f4 ff 48 83 3c 24 00 0f 84 f9 00 00 ... Actual results: qemu crash. Expected results: qemu should not crash. Additional info: 1. Without vIOMMU, qemu works well. Reference [1] # cat boot_ovs_client.sh #!/bin/bash set -e echo "killing old ovs process" pkill -f ovs-vswitchd || true sleep 5 pkill -f ovsdb-server || true echo "probing ovs kernel module" modprobe -r openvswitch || true modprobe openvswitch echo "clean env" DB_FILE=/etc/openvswitch/conf.db rm -rf /var/run/openvswitch mkdir /var/run/openvswitch rm -f $DB_FILE echo "init ovs db and boot db server" export DB_SOCK=/var/run/openvswitch/db.sock ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file ovs-vsctl --no-wait init echo "start ovs vswitch daemon" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1" ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log echo "creating bridge and ports" ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl --if-exists del-br ovsbr1 ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev ovs-vsctl add-port ovsbr1 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:5e:00.1 ovs-vsctl add-port ovsbr1 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock ovs-ofctl del-flows ovsbr1 ovs-ofctl add-flow ovsbr1 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr1 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x15554 ovs-vsctl set Interface dpdk0 options:n_rxq=2 ovs-vsctl set Interface dpdk1 options:n_rxq=2 echo "all done" [2] # cat qemu.sh /usr/libexec/qemu-kvm \ -name guest=rhel8.2 \ -machine q35,kernel_irqchip=split \ -cpu host \ -m 8192 \ -smp 6,sockets=6,cores=1,threads=1 \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on \ -numa node,memdev=mem -mem-prealloc \ -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -device pcie-root-port,port=0x13,chassis=4,id=pci.4,bus=pcie.0,addr=0x2.0x3 \ -device pcie-root-port,port=0x14,chassis=5,id=pci.5,bus=pcie.0,addr=0x2.0x4 \ -device pcie-root-port,port=0x15,chassis=6,id=pci.6,bus=pcie.0,addr=0x2.0x5 \ -device pcie-root-port,port=0x16,chassis=7,id=pci.7,bus=pcie.0,addr=0x2.0x6 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/home/images_nfv-virt-rt-kvm/rhel8.2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,id=hostnet0 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=88:66:da:5f:dd:01,bus=pci.3,addr=0x0 \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ -monitor stdio \ -vnc :2 \ --- Additional comment from Pei Zhang on 2020-01-07 15:12:46 HKT --- Update OpenvSwith versions: openvswitch2.11-2.11.0-35.el8fdp.x86_64 --- Additional comment from Pei Zhang on 2020-01-07 15:19:23 HKT --- If replace OpenvSwitch with dpdk's testpmd as vhost-user clients, qemu will work well. dpdk version: dpdk-19.11-1.el8.x86_64 Steps: 1. Replace Step 1 of Description with booting dpdk's testpmd, qemu will keep working well and testpmd can receive packets well. /usr/bin/testpmd \ -l 2,4,6,8,10,12,14,16,18 \ --socket-mem 1024,1024 \ -n 4 \ -d /usr/lib64/librte_pmd_vhost.so \ --vdev 'net_vhost0,iface=/tmp/vhostuser0.sock,queues=2,client=1,iommu-support=1' \ --vdev 'net_vhost1,iface=/tmp/vhostuser1.sock,queues=2,client=1,iommu-support=1' \ --iova-mode pa \ -- \ --portmask=f \ -i \ --rxd=512 --txd=512 \ --rxq=2 --txq=2 \ --nb-cores=8 \ --forward-mode=io testpmd> set portlist 0,2,1,3 testpmd> start testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 88822640 RX-missed: 0 RX-bytes: 5329364952 RX-errors: 0 RX-nombuf: 0 TX-packets: 75692878 TX-errors: 0 TX-bytes: 4541579232 Throughput (since last show) Rx-pps: 147041 Rx-bps: 70580080 Tx-pps: 125307 Tx-bps: 60147664 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 75693848 RX-missed: 0 RX-bytes: 4541637432 RX-errors: 0 RX-nombuf: 0 TX-packets: 88821732 TX-errors: 0 TX-bytes: 5329310472 Throughput (since last show) Rx-pps: 125307 Rx-bps: 60147664 Tx-pps: 147041 Tx-bps: 70580080 ############################################################################ --- Additional comment from Rick Barry on 2020-01-08 00:05:52 HKT --- Assigning to Amnon's team for review. CC'ing Ariel. Priority is High, so I set ITR to 8.2.0. Feel free to change that, if you feel that's incorrect. --- Additional comment from Ariel Adam on 2020-01-08 00:36:30 HKT --- Adrian, can you take a look? May be an FD bug and not an AV bug. If so please change the component from qemu-kvm to FDP. --- Additional comment from Adrián Moreno on 2020-01-13 20:07:19 HKT --- Hi Pei, I haven't reproduced it yet but first, let me ask: > --iova-mode pa \ Any reason why PA is used? Is it also reproducible without that option (or with iova-mode=va)? Also, if it's not much to ask. Can you try to reproduce it with dpdk 18.11.5 stable release? Thanks, Adrian --- Additional comment from Adrián Moreno on 2020-01-16 18:35:10 HKT --- Also, Pei, according to the few logs that qemu writes, it seems related with the iotlb updates messages in the vhost-user backend. My guess is that it's really not related with "packed" queue. Can you please confirm this? Thanks --- Additional comment from Adrián Moreno on 2020-01-16 19:40:50 HKT --- There might be multiple issues going on here, so let's try to split them up: Looking at the code, qemu's implementation of vhost-user + multiqueue + iommu is likely to be utterly broken. It will create a slave channel per queue pair. When the second slave channel is created, the first one is closed by the vhost-user backend (which explains the "Failed to read from slave" errors). And when the first queue is started, SET_VRING_ADDR on queue pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. That is most likely the cause of qemu's segfault. I'll work upstream to fix that. If that's true: - It should be as reproducible with testpmd as it is with OvS. Pei, can you double check "queues=2" is present in qemu's command line for the testpmd case? - It should be as reproducible with or without "packed=on" Pei, can you please confirm this? Now, that does not explain OvS's crash. Can you please attach some logs to try to figure out what's going on there? Thanks --- Additional comment from Pei Zhang on 2020-01-17 17:39:57 HKT --- Hi Adrián, I've been trying to reproduce this issue many times (As I cannot reproduce with libvirt at first, but 100% reproduced). Finally I found the way to reproduce, no matter with qemu or libvirt. Here is the update: 1. vIOMMU + packed=on + memory host-nodes=1 are 3 key points to reproduce the issue 100%. Without one of them, this issue can not be reproduced. -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,queues=2,id=hostnet1 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=88:66:da:5f:dd:02,bus=pci.6,addr=0x0,iommu_platform=on,ats=on,packed=on \ -chardev socket,id=charnet2,path=/tmp/vhostuser1.sock,server \ -netdev vhost-user,chardev=charnet2,queues=2,id=hostnet2 \ -device virtio-net-pci,mq=on,vectors=6,rx_queue_size=1024,netdev=hostnet2,id=net2,mac=88:66:da:5f:dd:03,bus=pci.7,addr=0x0,iommu_platform=on,ats=on,packed=on \ -object memory-backend-file,id=mem,size=8G,mem-path=/dev/hugepages,share=on,host-nodes=1,policy=bind \ -numa node,memdev=mem -mem-prealloc \ 2. When I reported this bz, I didn't explicitly set memory host-nodes=1, but in my setup it's default using memory host-nodes=1. (When we set memory host-nodes=0, it works very well) Sorry for late response. I just wanted to provide some certain testing results to avoid possible confusion and this took a bit long time. --- Additional comment from Pei Zhang on 2020-01-17 17:41:01 HKT --- (In reply to Pei Zhang from comment #8) > Hi Adrián, > > I've been trying to reproduce this issue many times (As I cannot reproduce > with libvirt at first, but 100% reproduced). fix typo: but 100% reproduced with qemu. --- Additional comment from Pei Zhang on 2020-01-17 17:50:28 HKT --- (In reply to Adrián Moreno from comment #5) > Hi Pei, > > I haven't reproduced it yet but first, let me ask: > > --iova-mode pa \ > > Any reason why PA is used? Is it also reproducible without that option (or > with iova-mode=va)? Hi Adrián, I tested with iova-mode=va as it's a valid usage according https://bugzilla.redhat.com/show_bug.cgi?id=1738751. Without iova-mode=va, this issue can also be reproduced. > > Also, if it's not much to ask. Can you try to reproduce it with dpdk 18.11.5 > stable release? Yes, it can also be reproduced with dpdk 18.11.5. (In reply to Adrián Moreno from comment #6) > Also, Pei, according to the few logs that qemu writes, it seems related with > the iotlb updates messages in the vhost-user backend. > My guess is that it's really not related with "packed" queue. > Can you please confirm this? Without packed=on, this issue can not be reproduced any more. --- Additional comment from Pei Zhang on 2020-01-17 18:11:29 HKT --- (In reply to Adrián Moreno from comment #7) > There might be multiple issues going on here, so let's try to split them up: > > Looking at the code, qemu's implementation of vhost-user + multiqueue + > iommu is likely to be utterly broken. It will create a slave channel per > queue pair. When the second slave channel is created, the first one is > closed by the vhost-user backend (which explains the "Failed to read from > slave" errors). And when the first queue is started, SET_VRING_ADDR on queue > pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. > That is most likely the cause of qemu's segfault. I'll work upstream to fix > that. > > If that's true: > - It should be as reproducible with testpmd as it is with OvS. Pei, can you > double check "queues=2" is present in qemu's command line for the testpmd > case? With testpmd(replace ovs) on host, this issue can not be reproducible. I can confirm this. It only can be reproduced over OvS. Yes, queues=2 is present in qemu's command line. qeueues=2 is also present in OvS and testpmd in guest. Besides, this issue can also reproducible with single queue. > - It should be as reproducible with or without "packed=on" Without "packed=on", this issue can not be reproduced. > > Pei, can you please confirm this? > > Now, that does not explain OvS's crash. Can you please attach some logs to > try to figure out what's going on there? Please check this link for the ovs log, libvirt log, and full xml. http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/pezhang/bug1788415/ Thanks. Best regards, Pei --- Additional comment from Adrián Moreno on 2020-01-17 18:15:45 HKT --- (In reply to Pei Zhang from comment #10) > (In reply to Adrián Moreno from comment #6) > > Also, Pei, according to the few logs that qemu writes, it seems related with > > the iotlb updates messages in the vhost-user backend. > > My guess is that it's really not related with "packed" queue. > > Can you please confirm this? > > Without packed=on, this issue can not be reproduced any more. OK, so there are several issues here. I get a qemu crash 100% with iommu=on and multiqueue (no matter the value of "packed" or "host-nodes"). In fact, I can't even run testpmd in the guest, it crashes at boot time. I have a fix for that and it's working in my setup (still not posted upstream), so I can proceed with the next issue. --- Additional comment from Adrián Moreno on 2020-01-17 18:17:52 HKT --- (In reply to Pei Zhang from comment #11) > (In reply to Adrián Moreno from comment #7) > > There might be multiple issues going on here, so let's try to split them up: > > > > Looking at the code, qemu's implementation of vhost-user + multiqueue + > > iommu is likely to be utterly broken. It will create a slave channel per > > queue pair. When the second slave channel is created, the first one is > > closed by the vhost-user backend (which explains the "Failed to read from > > slave" errors). And when the first queue is started, SET_VRING_ADDR on queue > > pair 1 will generate an IOTBL_MISS on slave channel bound to queue pair 2. > > That is most likely the cause of qemu's segfault. I'll work upstream to fix > > that. > > > > If that's true: > > - It should be as reproducible with testpmd as it is with OvS. Pei, can you > > double check "queues=2" is present in qemu's command line for the testpmd > > case? > > With testpmd(replace ovs) on host, this issue can not be reproducible. I can > confirm this. It only can be reproduced over OvS. > > Yes, queues=2 is present in qemu's command line. qeueues=2 is also present > in OvS and testpmd in guest. > > Besides, this issue can also reproducible with single queue. > > > > - It should be as reproducible with or without "packed=on" > > Without "packed=on", this issue can not be reproduced. > > > > > Pei, can you please confirm this? > > > > Now, that does not explain OvS's crash. Can you please attach some logs to > > try to figure out what's going on there? > > Please check this link for the ovs log, libvirt log, and full xml. > Thank you. I don't have a way to reproduce this issue since I don't have a NUMA system. > http://fileshare.englab.nay.redhat.com/pub/section2/coredump/var/crash/ > pezhang/bug1788415/ > I don't have permission to download the files. Yo may have to chmod them. Thanks --- Additional comment from Pei Zhang on 2020-01-17 18:44:34 HKT --- Hi Adrian, Do we need bug to track OvS crash issue? ovs2.11 and ovs2.12 both hit this problem. Please let me know if need and I can file one. Thanks. Best regards, Pei --- Additional comment from Adrián Moreno on 2020-01-20 18:33:24 HKT --- (In reply to Pei Zhang from comment #14) > Hi Adrian, > > Do we need bug to track OvS crash issue? ovs2.11 and ovs2.12 both hit this > problem. > Yes. Let's split this up. Thanks > Please let me know if need and I can file one. Thanks. > > Best regards, > > Pei --- Additional comment from Pei Zhang on 2020-01-20 23:42:31 HKT --- Versions: openvswitch2.11-2.11.0-35.el8fdp.x86_64 We filed this bz to track ovs2.11 crash issue. --- Additional comment from Adrián Moreno on 2020-01-25 17:15:39 HKT --- Posted a fix to upstream DPDK: http://patches.dpdk.org/patch/65122/
Verified with openvswitch2.12-2.12.0-23.el8fdp.x86_64 and openvswitch2.11-2.11.0-50.el8fdp.x86_64: Following steps in Description. Both ovs and qemu keep working well. And the throughput performance looks good. Other Versions: 4.18.0-187.el8.x86_64 qemu-kvm-4.2.0-13.module+el8.2.0+5898+fb4bceae.x86_64 tuned-2.13.0-5.el8.noarch python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 dpdk-19.11-4.el8.x86_64 With openvswitch2.12-2.12.0-23.el8fdp.x86_64: Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu_packed Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.614381 21.614381 With openvswitch2.11-2.11.0-50.el8fdp.x86_64: Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu_packed Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.307350 21.30735 So this bug has been fixed very well. Move to 'VERIFIED'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1459