+++ This bug was initially created as a clone of Bug #1799017 +++ +++ This bug was initially created as a clone of Bug #1798996 +++ Description of problem: Boot guest over ovs with vhost-user ports. In guest, keep vhost-user NIC using virtio-pci driver. Then migrate guest from src to des host. Both qemu and guest will hang on src host. Version-Release number of selected component (if applicable): 4.18.0-176.el8.x86_64 qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 openvswitch2.12-2.12.0-21.el8fdp.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot ovs with 1 vhost-user NIC on both src and des host. Refer to [1] 2. Boot qemu with vhost-user. Refer to [2] 3. Check vhost-user driver in guest. Keep it's default virtio-pci. 4. Migrate guest from src to des. Both src qemu and guest hang. (qemu) migrate -d tcp:10.73.72.196:5555 (qemu) Actual results: Both qemu and guest hang when do migration. Expected results: Both qemu and guest should keep working well and can migrate successfully. Additional info: 1. This is a regression bug. openvswitch2.12-2.12.0-12.el8fdp.x86_64 works well. 2. If binding vhost-user NIC from virtio-pci to vfio-pci in guest, this issue is gone. 3. openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64 works well. 4. This was testing with FDP 20.B. As there is no version "FDP 20.B" in bugzilla now, so I chose 20.A. I would highlight 20.A version works well. 5. Though qemu hang, I don't think it's qemu issue. As below versions (1) and (2) work well. (1)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64 works well (2)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.12-2.12.0-12.el8fdp.x86_64 works well (3)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.12-2.12.0-21.el8fdp.x86_64(bug version) fail Reference: [1] #!/bin/bash set -e echo "killing old ovs process" pkill -f ovs-vswitchd || true sleep 5 pkill -f ovsdb-server || true echo "probing ovs kernel module" modprobe -r openvswitch || true modprobe openvswitch echo "clean env" DB_FILE=/etc/openvswitch/conf.db rm -rf /var/run/openvswitch mkdir /var/run/openvswitch rm -f $DB_FILE echo "init ovs db and boot db server" export DB_SOCK=/var/run/openvswitch/db.sock ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file ovs-vsctl --no-wait init echo "start ovs vswitch daemon" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1" ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log echo "creating bridge and ports" ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x14 ovs-vsctl set Interface dpdk0 options:n_rxq=1 echo "all done" [2] /usr/libexec/qemu-kvm \ -name guest=rhel8.2 \ -machine pc-q35-rhel8.2.0,kernel_irqchip=split \ -cpu host \ -m 8192 \ -overcommit mem-lock=on \ -smp 6,sockets=6,cores=1,threads=1 \ -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-rhel8.2,share=yes,size=8589934592,host-nodes=0,policy=bind \ -numa node,nodeid=0,cpus=0-5,memdev=ram-node0 \ -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,id=hostnet1 \ -device virtio-net-pci,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,bus=pci.3,addr=0x0,iommu_platform=on,ats=on \ -monitor stdio \ -vnc 0:1 \ --- Additional comment from Maxime Coquelin on 2020-02-24 15:30:37 UTC --- Fix posted upstream in merged in master: commit 4f37df14c405b754b5e971c75f4f67f4bb5bfdde Author: Adrian Moreno <amorenoz> Date: Thu Feb 13 11:04:58 2020 +0100 vhost: protect log address translation in IOTLB update Currently, the log address translation only happens in the vhost-user's translate_ring_addresses(). However, the IOTLB update handler is not checking if it was mapped to re-trigger that translation. Since the log address mapping could fail, check it on iotlb updates. Also, check it on vring_translate() so we do not dirty pages if the logging address is not yet ready. Additionally, properly protect the accesses to the iotlb structures. Fixes: fbda9f145927 ("vhost: translate incoming log address to GPA") Cc: stable Signed-off-by: Adrian Moreno <amorenoz> Reviewed-by: Maxime Coquelin <maxime.coquelin>
Verified with openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64: All migration test cases get PASS. All OVS related cases from Virt get PASS. Testcase: live_migration_nonrt_server_2Q_1G_iommu_ovs =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 223 18466 0 513906 1 1Mpps 218 16841 0 502137 2 1Mpps 249 18042 0 558323 3 1Mpps 201 16663 0 467533 Max 1Mpps 249 18466 0 558323 Min 1Mpps 201 16663 0 467533 Mean 1Mpps 222 17503 0 510474 Median 1Mpps 220 17441 0 508021 Stdev 0 19.87 887.27 0.0 37482.18 Testcase: live_migration_nonrt_server_1Q_2M_iommu_ovs =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 177 14526 0 424732 1 1Mpps 158 14862 0 391369 2 1Mpps 195 14693 0 463311 3 1Mpps 157 14069 0 387091 Max 1Mpps 195 14862 0 463311 Min 1Mpps 157 14069 0 387091 Mean 1Mpps 171 14537 0 416625 Median 1Mpps 167 14609 0 408050 Stdev 0 18.03 341.13 0.0 35380.92 Testcase: live_migration_nonrt_server_1Q_1G_iommu_ovs =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 161 16710 0 394751 1 1Mpps 162 17723 0 392494 2 1Mpps 176 16436 0 421343 3 1Mpps 159 16883 0 404126 Max 1Mpps 176 17723 0 421343 Min 1Mpps 159 16436 0 392494 Mean 1Mpps 164 16938 0 403178 Median 1Mpps 161 16796 0 399438 Stdev 0 7.75 554.75 0.0 13115.23 Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 20.833854 20.833854 Testcase: vhostuser_hotplug_nonrt_server_iommu Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.127253 21.127253 0 64 0 21.307320 21.30732 Testcase: vhostuser_reconnect_nonrt_iommu_qemu Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.127277 21.127277 0 64 0 21.307322 21.307322 0 64 0 21.127260 21.12726 Versions: 4.18.0-184.el8.x86_64 tuned-2.13.0-5.el8.noarch dpdk-19.11-4.el8.x86_64 openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64 python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64 qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64 So this bug has been fixed very well. Move to 'VERIFIED'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0747