Description of problem: vhost-user stop receiving packets after migration. Version-Release number of selected component (if applicable): openvswitch-2.9.0-124.el7fdp.x86_64 openvswitch-selinux-extra-policy-1.0-15.el7fdp.noarch 3.10.0-1122.el7.x86_64 qemu-kvm-rhev-2.12.0-42.el7.x86_64 libvirt-4.5.0-31.el7.x86_64 dpdk-18.11.2-1.el7_6.x86_64 How reproducible: 100% Steps to Reproduce: 1. Boot ovs with dpdkvhostuserclient ports on both src and des hosts, refer to[1] # ovs-vsctl show 83dd9e54-2fb4-4eed-be58-cab15e4d4c3e Bridge "ovsbr1" Port "ovsbr1" Interface "ovsbr1" type: internal Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} Port "dpdk1" Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:5e:00.1", n_rxq="1"} Bridge "ovsbr0" Port "ovsbr0" Interface "ovsbr0" type: internal Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:5e:00.0", n_rxq="1"} Port "vhost-user0" Interface "vhost-user0" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} 2. Boot VM on src host 3. Start testpmd in guest # /usr/bin/testpmd \ -l 1,2,3,4,5 \ -n 4 \ -d /usr/lib64/librte_pmd_virtio.so \ -w 0000:06:00.0 -w 0000:07:00.0 \ --iova-mode pa \ -- \ --nb-cores=4 \ -i \ --disable-rss \ --rxd=512 --txd=512 \ --rxq=1 --txq=1 4. Start MoonGen in another host # ./build/MoonGen /home/nfv-virt-rt-kvm/tests/utils/rfc1242.lua 0 1 64 3000000 4.1 5. Check testpmd in guest, it can receive packets well. testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 10094535 RX-missed: 0 RX-bytes: 605672100 RX-errors: 0 RX-nombuf: 0 TX-packets: 10093278 TX-errors: 0 TX-bytes: 605596680 Throughput (since last show) Rx-pps: 995333 Tx-pps: 995336 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 10094564 RX-missed: 0 RX-bytes: 605673840 RX-errors: 0 RX-nombuf: 0 TX-packets: 10093305 TX-errors: 0 TX-bytes: 605598300 Throughput (since last show) Rx-pps: 995324 Tx-pps: 995321 ############################################################################ 6. Migrate guest from src host to des host # /bin/virsh migrate --verbose --persistent --live rhel7.8 qemu+ssh://10.73.72.196/system 7. Check in guest, testpmd stop receiving packets, Rx-pps and TX-pps become 0. testpmd> show port stats all ######################## NIC statistics for port 0 ######################## RX-packets: 124726123 RX-missed: 0 RX-bytes: 7483569564 RX-errors: 0 RX-nombuf: 0 TX-packets: 124724797 TX-errors: 0 TX-bytes: 7483490008 Throughput (since last show) Rx-pps: 0 Tx-pps: 0 ############################################################################ ######################## NIC statistics for port 1 ######################## RX-packets: 124726111 RX-missed: 0 RX-bytes: 7483568844 RX-errors: 0 RX-nombuf: 0 TX-packets: 124724482 TX-errors: 0 TX-bytes: 7483471108 Throughput (since last show) Rx-pps: 0 Tx-pps: 0 ############################################################################ Actual results: vhost-user stop receiving packets after migration. Expected results: vhost-user should keep receiving packets after migration. Additional info: 1. This is a regression bug. openvswitch-2.9.0-122.el7fdp.x86_64 works well Reference: [1] # cat boot_ovs_client.sh #!/bin/bash set -e echo "killing old ovs process" pkill -f ovs-vswitchd || true sleep 5 pkill -f ovsdb-server || true echo "probing ovs kernel module" modprobe -r openvswitch || true modprobe openvswitch echo "clean env" DB_FILE=/etc/openvswitch/conf.db rm -rf /var/run/openvswitch mkdir /var/run/openvswitch rm -f $DB_FILE echo "init ovs db and boot db server" export DB_SOCK=/var/run/openvswitch/db.sock ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file ovs-vsctl --no-wait init echo "start ovs vswitch daemon" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024" ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1" ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log echo "creating bridge and ports" ovs-vsctl --if-exists del-br ovsbr0 ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock ovs-ofctl del-flows ovsbr0 ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl --if-exists del-br ovsbr1 ovs-vsctl add-br ovsbr1 -- set bridge ovsbr1 datapath_type=netdev ovs-vsctl add-port ovsbr1 dpdk1 -- set Interface dpdk1 type=dpdk options:dpdk-devargs=0000:5e:00.1 ovs-vsctl add-port ovsbr1 vhost-user1 -- set Interface vhost-user1 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser1.sock ovs-ofctl del-flows ovsbr1 ovs-ofctl add-flow ovsbr1 "in_port=1,idle_timeout=0 actions=output:2" ovs-ofctl add-flow ovsbr1 "in_port=2,idle_timeout=0 actions=output:1" ovs-vsctl set Open_vSwitch . other_config={} ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1 ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1554 ovs-vsctl set Interface dpdk0 options:n_rxq=1 ovs-vsctl set Interface dpdk1 options:n_rxq=1 echo "all done"
The only related change that got in is the following: +Patch1270: 0001-vhost-add-number-of-fds-to-vhost-user-messages.patch +Patch1271: 0002-vhost-fix-possible-denial-of-service-on-SET_VRING_NU.patch +Patch1272: 0003-vhost-fix-possible-denial-of-service-by-leaking-FDs.patch As I'm not too familiar with the VHOST code does it make more sense for Maxime to take a look. Maybe Maxime already has an idea what could cause this, Maxime?
Hi Pei, Could you please share the ovs-vswitchd logs?
Verified with openvswitch-2.9.0-126.el7fdn.x86_64: DPDK testpmd in guest keep receiving packets very well after migration. Other testing versions: 3.10.0-1127.el7.x86_64 qemu-kvm-rhev-2.12.0-44.el7.x86_64 tuned-2.11.0-8.el7.noarch libvirt-4.5.0-33.el7.x86_64 dpdk-18.11.2-1.el7.x86_64 openvswitch-2.9.0-126.el7fdn.x86_64 So this bug has been fixed very well. Move to 'VERIFIED'.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:1456