Description of problem: Boot guest over ovs with vhost-user ports. In guest, keep vhost-user NIC using virtio-pci driver. Then migrate guest from src to des host. Both qemu and guest will hang on src host. Version-Release number of selected component (if applicable): 4.18.0-176.el8.x86_64 qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 openvswitch2.11-2.11.0-47.el8fdp.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot ovs with 1 vhost-user NIC on both src and des host. Refer to [1] 2. Boot qemu with vhost-user. Refer to [2] 3. Check vhost-user driver in guest. Keep it's default virtio-pci. 4. Migrate guest from src to des. Both src qemu and guest hang. (qemu) migrate -d tcp:10.73.72.196:5555 (qemu) Actual results: Both qemu and guest hang when do migration. Expected results: Both qemu and guest should keep working well and can migrate successfully. Additional info: 1. This is a regression bug. openvswitch2.11-2.11.0-35.el8fdp.x86_64 works well. 2. If binding vhost-user NIC from virtio-pci to vfio-pci in guest, this issue is gone. 3. openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64 works well. Reference: [1] #!/bin/bash set -e echo "killing old ovs process" pkill -f ovs-vswitchd || true sleep 5 pkill -f ovsdb-server || true echo "probing ovs kernel module" modprobe -r openvswitch || true modprobe openvswitch echo "clean env" DB_FILE=/etc/openvswitch/conf.db rm -rf /var/run/openvswitch mkdir /var/run/openvswitch rm -f $DB_FILE echo "init ovs db and boot db server" export DB_SOCK=/var/run/openvswitch/db.sock ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file ovs-vsctl --no-wait init echo "start ovs vswitch daemon" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1" ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log echo "creating bridge and ports" ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x14 ovs-vsctl set Interface dpdk0 options:n_rxq=1 echo "all done" [2] /usr/libexec/qemu-kvm \ -name guest=rhel8.2 \ -machine pc-q35-rhel8.2.0,kernel_irqchip=split \ -cpu host \ -m 8192 \ -overcommit mem-lock=on \ -smp 6,sockets=6,cores=1,threads=1 \ -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-rhel8.2,share=yes,size=8589934592,host-nodes=0,policy=bind \ -numa node,nodeid=0,cpus=0-5,memdev=ram-node0 \ -device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \ -device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \ -device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \ -device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \ -blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.2.qcow2,node-name=my_file \ -blockdev driver=qcow2,node-name=my,file=my_file \ -device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \ -chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \ -netdev vhost-user,chardev=charnet1,id=hostnet1 \ -device virtio-net-pci,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,bus=pci.3,addr=0x0,iommu_platform=on,ats=on \ -monitor stdio \ -vnc 0:1 \
This was testing with FDP 20.B. As there is no version "FDP 20.B" in bugzilla now, so I chose 20.A. I would highlight 20.A version works well.
Though qemu hang, I don't think it's qemu issue. As below versions (1) and (2) work well. (1)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64 works well (2)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.11-2.11.0-35.el8fdp.x86_64 works well (3)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.11-2.11.0-47.el8fdp.x86_64(bug version) fail
Thanks to Pei, I managed to reproduce on her testbed. It seems there is a deadlock on VHOST_USER_SET_VRING_ADDR handling: (gdb) info threads Id Target Id Frame * 1 Thread 0x7fca28a1fbc0 (LWP 9156) "ovs-vswitchd" 0x00007fca26acff21 in poll () from /lib64/libc.so.6 2 Thread 0x7fca24f03700 (LWP 9157) "eal-intr-thread" 0x00007fca26adb1b7 in epoll_wait () from /lib64/libc.so.6 3 Thread 0x7fca24702700 (LWP 9158) "rte_mp_handle" 0x00007fca27672a67 in recvmsg () from /lib64/libpthread.so.0 4 Thread 0x7fca23f01700 (LWP 9159) "dpdk_watchdog1" 0x00007fca26aa7238 in nanosleep () from /lib64/libc.so.6 5 Thread 0x7fca23700700 (LWP 9161) "urcu2" 0x00007fca26acff21 in poll () from /lib64/libc.so.6 6 Thread 0x7fca22eff700 (LWP 9165) "ct_clean8" 0x00007fca26acff21 in poll () from /lib64/libc.so.6 7 Thread 0x7fca226fe700 (LWP 9166) "ipf_clean5" 0x00007fca26acff21 in poll () from /lib64/libc.so.6 8 Thread 0x7fca01e6b700 (LWP 9175) "vhost_reconn" 0x00007fca26aa7238 in nanosleep () from /lib64/libc.so.6 9 Thread 0x7fca0166a700 (LWP 9176) "vhost-events" rte_rwlock_read_lock (rwl=<optimized out>) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/x86_64-native-linuxapp-gcc/include/generic/rte_rwlock.h:71 10 Thread 0x7fca21634700 (LWP 9182) "handler12" 0x00007fca26acff21 in poll () from /lib64/libc.so.6 11 Thread 0x7fca20e33700 (LWP 9183) "revalidator11" 0x00007fca26acff21 in poll () from /lib64/libc.so.6 12 Thread 0x7fca0266c700 (LWP 9192) "pmd13" rte_rdtsc () at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/build-static/../dpdk-stable-18.11.5/x86_64-native-linuxapp-gcc/include/rte_cycles.h:49 13 Thread 0x7fca03736700 (LWP 9193) "pmd14" 0x00007ffc78f6b9c9 in ?? () 14 Thread 0x7fca03fff700 (LWP 9194) "pmd15" 0x000055cd83c3fc3e in histogram_add_sample (val=0, hist=0x55cd874dcf50) at ../lib/dpif-netdev-perf.h:326 15 Thread 0x7fca00e69700 (LWP 9195) "pmd16" rte_rdtsc () at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/build-static/../dpdk-stable-18.11.5/x86_64-native-linuxapp-gcc/include/rte_cycles.h:49 (gdb) t 9 [Switching to thread 9 (Thread 0x7fca0166a700 (LWP 9176))] #0 rte_rwlock_read_lock (rwl=<optimized out>) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/x86_64-native-linuxapp-gcc/include/generic/rte_rwlock.h:71 71 /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/x86_64-native-linuxapp-gcc/include/generic/rte_rwlock.h: No such file or directory. (gdb) bt #0 rte_rwlock_read_lock (rwl=<optimized out>) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/x86_64-native-linuxapp-gcc/include/generic/rte_rwlock.h:71 #1 vhost_user_iotlb_rd_lock (vq=<optimized out>) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/iotlb.h:42 #2 __vhost_iova_to_vva (dev=0x15024de40, vq=vq@entry=0x15024db00, iova=10489253888, iova@entry=10489251904, size=size@entry=0x7fca01669630, perm=perm@entry=3 '\003') at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/vhost.c:66 #3 0x000055cd83ba3033 in vhost_iova_to_vva (perm=3 '\003', len=0x7fca01669630, iova=10489251904, vq=0x15024db00, dev=0x15024de40) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/vhost.h:557 #4 translate_log_addr (log_addr=10489251904, vq=0x15024db00, dev=0x15024de40) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/vhost_user.c:643 #5 translate_ring_addresses (dev=0x15024de40, vq_index=0) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/vhost_user.c:670 #6 0x000055cd83ba340e in vhost_user_set_vring_addr (pdev=pdev@entry=0x7fca016696c8, msg=msg@entry=0x7fca016696d0, main_fd=main_fd@entry=63) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/vhost_user.c:827 #7 0x000055cd839bba00 in vhost_user_msg_handler (vid=<optimized out>, fd=fd@entry=63) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/vhost_user.c:2189 #8 0x000055cd83b9ed73 in vhost_user_read_cb (connfd=63, dat=0x7fc9f8001200, remove=0x7fca01669a30) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/socket.c:298 #9 0x000055cd83b9dab7 in fdset_event_dispatch (arg=0x55cd8413d5a0 <vhost_user+8192>) at /usr/src/debug/openvswitch2.12-2.12.0-21.el8fdp.x86_64/dpdk-stable-18.11.5/lib/librte_vhost/fd_man.c:286 #10 0x00007fca276682de in start_thread () from /lib64/libpthread.so.0 #11 0x00007fca26adae83 in clone () from /lib64/libc.so.6 However, looking at their backtraces, no other thread seems to be holding the lock. It is likely that a patch introduced a regression, by not releasing the lock in some condition. Next step is check vhost patches that were introduced between this version and the known working one.
Fix posted upstream in merged in master: commit 4f37df14c405b754b5e971c75f4f67f4bb5bfdde Author: Adrian Moreno <amorenoz> Date: Thu Feb 13 11:04:58 2020 +0100 vhost: protect log address translation in IOTLB update Currently, the log address translation only happens in the vhost-user's translate_ring_addresses(). However, the IOTLB update handler is not checking if it was mapped to re-trigger that translation. Since the log address mapping could fail, check it on iotlb updates. Also, check it on vring_translate() so we do not dirty pages if the logging address is not yet ready. Additionally, properly protect the accesses to the iotlb structures. Fixes: fbda9f145927 ("vhost: translate incoming log address to GPA") Cc: stable Signed-off-by: Adrian Moreno <amorenoz> Reviewed-by: Maxime Coquelin <maxime.coquelin>
Backported two patches: - vhost: fix vring memory partially mapped - vhost: protect log address translation in IOTLB update
Verified with openvswitch2.11-2.11.0-48.el7fdp.x86_64: All migration test cases get PASS. And all OVS related cases from Virt get PASS. ==Results== Testcase: live_migration_nonrt_server_2Q_1G_ovs =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 126 17495 0 401551 1 1Mpps 202 17019 0 1142800 2 1Mpps 225 17025 0 586946 3 1Mpps 131 16277 0 411604 Max 1Mpps 225 17495 0 1142800 Min 1Mpps 126 16277 0 401551 Mean 1Mpps 171 16954 0 635725 Median 1Mpps 166 17022 0 499275 Stdev 0 50.0 503.41 0.0 348602.99 Testcase: live_migration_nonrt_server_1Q_2M_ovs =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 149 13887 0 491829 1 1Mpps 133 13443 0 493933 2 1Mpps 154 13461 0 529340 3 1Mpps 204 13098 0 590595 Max 1Mpps 204 13887 0 590595 Min 1Mpps 133 13098 0 491829 Mean 1Mpps 160 13472 0 526424 Median 1Mpps 151 13452 0 511636 Stdev 0 30.66 323.04 0.0 46111.82 Testcase: live_migration_nonrt_server_1Q_1G_ovs =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 90 16178 0 326702 1 1Mpps 78 16050 0 302509 2 1Mpps 85 15957 0 312703 3 1Mpps 76 16094 0 300406 Max 1Mpps 90 16178 0 326702 Min 1Mpps 76 15957 0 300406 Mean 1Mpps 82 16069 0 310580 Median 1Mpps 81 16072 0 307606 Stdev 0 6.4 92.03 0.0 12014.95 Testcase: nfv_acceptance_nonrt_server_2Q_1G Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.307379 21.307379 Testcase: vhostuser_reconnect_nonrt_ovs Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.614404 21.614404 0 64 0 21.739714 21.739714 0 64 0 21.614388 21.614388 Testcase: vhostuser_hotplug_nonrt_server Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.614407 21.614407 Testcase: vhostuser_reconnect_nonrt_qemu Packets_loss Frame_Size Run_No Throughput Avg_Throughput 0 64 0 21.307390 21.30739 0 64 0 21.307424 21.307424 0 64 0 21.307412 21.307412 Versions: 3.10.0-1127.el7.x86_64 dpdk-18.11.2-1.el7.x86_64 qemu-kvm-rhev-2.12.0-44.el7.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch tuned-2.11.0-8.el7.noarch openvswitch2.11-2.11.0-48.el7fdp.x86_64 libvirt-4.5.0-33.el7.x86_64 So this bug has been fixed very well. Move to 'VERIFIED'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:0743