The FDP team is no longer accepting new bugs in Bugzilla. Please report your issues under FDP project in Jira. Thanks.
Bug 1806599 - Both qemu and guest hang after migrating guest in which vhost-user NIC is using virtio-pci [ovs2.13]
Summary: Both qemu and guest hang after migrating guest in which vhost-user NIC is usi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Fast Datapath
Classification: Red Hat
Component: openvswitch2.13
Version: FDP 20.A
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Maxime Coquelin
QA Contact: Pei Zhang
URL:
Whiteboard:
Depends On: 1798996 1799017
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-02-24 15:34 UTC by Maxime Coquelin
Modified: 2020-03-10 09:36 UTC (History)
11 users (show)

Fixed In Version: openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1799017
Environment:
Last Closed: 2020-03-10 09:36:12 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:0747 0 None None None 2020-03-10 09:36:19 UTC

Description Maxime Coquelin 2020-02-24 15:34:17 UTC
+++ This bug was initially created as a clone of Bug #1799017 +++

+++ This bug was initially created as a clone of Bug #1798996 +++

Description of problem:
Boot guest over ovs with vhost-user ports. In guest, keep vhost-user NIC using virtio-pci driver. Then migrate guest from src to des host. Both qemu and guest will hang on src host.

Version-Release number of selected component (if applicable):
4.18.0-176.el8.x86_64
qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64
openvswitch2.12-2.12.0-21.el8fdp.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Boot ovs with 1 vhost-user NIC on both src and des host. Refer to [1]

2. Boot qemu with vhost-user. Refer to [2]

3. Check vhost-user driver in guest. Keep it's default virtio-pci.

4. Migrate guest from src to des. Both src qemu and guest hang.

(qemu) migrate -d tcp:10.73.72.196:5555
(qemu) 

Actual results:
Both qemu and guest hang when do migration.

Expected results:
Both qemu and guest should keep working well and can migrate successfully.

Additional info:
1. This is a regression bug.
openvswitch2.12-2.12.0-12.el8fdp.x86_64   works well.

2. If binding vhost-user NIC from virtio-pci to vfio-pci in guest, this issue is gone.

3. openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64 works well.

4. This was testing with FDP 20.B. As there is no version "FDP 20.B" in bugzilla now, so I chose 20.A. I would highlight 20.A version works well.

5. Though qemu hang, I don't think it's qemu issue. 

As below versions (1) and (2) work well.

(1)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.13-2.13.0-0.20200121git2a4f006.el8fdp.x86_64 works well

(2)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.12-2.12.0-12.el8fdp.x86_64                   works well

(3)qemu-kvm-4.2.0-8.module+el8.2.0+5607+dc756904.x86_64 & openvswitch2.12-2.12.0-21.el8fdp.x86_64(bug version)      fail

Reference:
[1]
#!/bin/bash

set -e

echo "killing old ovs process"
pkill -f ovs-vswitchd || true
sleep 5
pkill -f ovsdb-server || true

echo "probing ovs kernel module"
modprobe -r openvswitch || true
modprobe openvswitch

echo "clean env"
DB_FILE=/etc/openvswitch/conf.db
rm -rf /var/run/openvswitch
mkdir /var/run/openvswitch
rm -f $DB_FILE

echo "init ovs db and boot db server"
export DB_SOCK=/var/run/openvswitch/db.sock
ovsdb-tool create /etc/openvswitch/conf.db /usr/share/openvswitch/vswitch.ovsschema
ovsdb-server --remote=punix:$DB_SOCK --remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile --detach --log-file
ovs-vsctl --no-wait init

echo "start ovs vswitch daemon"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-mem="1024,1024"
ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask="0x1"
ovs-vsctl --no-wait set Open_vSwitch . other_config:vhost-iommu-support=true
ovs-vswitchd unix:$DB_SOCK --pidfile --detach --log-file=/var/log/openvswitch/ovs-vswitchd.log

echo "creating bridge and ports"

ovs-vsctl --if-exists del-br ovsbr0
ovs-vsctl add-br ovsbr0 -- set bridge ovsbr0 datapath_type=netdev
ovs-vsctl add-port ovsbr0 dpdk0 -- set Interface dpdk0 type=dpdk options:dpdk-devargs=0000:5e:00.0 
ovs-vsctl add-port ovsbr0 vhost-user0 -- set Interface vhost-user0 type=dpdkvhostuserclient options:vhost-server-path=/tmp/vhostuser0.sock
ovs-ofctl del-flows ovsbr0
ovs-ofctl add-flow ovsbr0 "in_port=1,idle_timeout=0 actions=output:2"
ovs-ofctl add-flow ovsbr0 "in_port=2,idle_timeout=0 actions=output:1"

ovs-vsctl set Open_vSwitch . other_config={}
ovs-vsctl set Open_vSwitch . other_config:dpdk-lcore-mask=0x1
ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x14
ovs-vsctl set Interface dpdk0 options:n_rxq=1


echo "all done"

[2]
/usr/libexec/qemu-kvm \
-name guest=rhel8.2 \
-machine pc-q35-rhel8.2.0,kernel_irqchip=split \
-cpu host \
-m 8192 \
-overcommit mem-lock=on \
-smp 6,sockets=6,cores=1,threads=1 \
-object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages/libvirt/qemu/1-rhel8.2,share=yes,size=8589934592,host-nodes=0,policy=bind \
-numa node,nodeid=0,cpus=0-5,memdev=ram-node0 \
-device intel-iommu,intremap=on,caching-mode=on,device-iotlb=on \
-device pcie-root-port,port=0x10,chassis=1,id=pci.1,bus=pcie.0,multifunction=on,addr=0x2 \
-device pcie-root-port,port=0x11,chassis=2,id=pci.2,bus=pcie.0,addr=0x2.0x1 \
-device pcie-root-port,port=0x12,chassis=3,id=pci.3,bus=pcie.0,addr=0x2.0x2 \
-blockdev driver=file,cache.direct=off,cache.no-flush=on,filename=/mnt/nfv/rhel8.2.qcow2,node-name=my_file \
-blockdev driver=qcow2,node-name=my,file=my_file \
-device virtio-blk-pci,scsi=off,iommu_platform=on,ats=on,bus=pci.2,addr=0x0,drive=my,id=virtio-disk0,bootindex=1,write-cache=on \
-chardev socket,id=charnet1,path=/tmp/vhostuser0.sock,server \
-netdev vhost-user,chardev=charnet1,id=hostnet1 \
-device virtio-net-pci,rx_queue_size=1024,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,bus=pci.3,addr=0x0,iommu_platform=on,ats=on \
-monitor stdio \
-vnc 0:1 \

--- Additional comment from Maxime Coquelin on 2020-02-24 15:30:37 UTC ---

Fix posted upstream in merged in master:

commit 4f37df14c405b754b5e971c75f4f67f4bb5bfdde
Author: Adrian Moreno <amorenoz>
Date:   Thu Feb 13 11:04:58 2020 +0100

    vhost: protect log address translation in IOTLB update

    Currently, the log address translation only  happens in the vhost-user's
    translate_ring_addresses(). However, the IOTLB update handler is not
    checking if it was mapped to re-trigger that translation.

    Since the log address mapping could fail, check it on iotlb updates.
    Also, check it on vring_translate() so we do not dirty pages if the
    logging address is not yet ready.

    Additionally, properly protect the accesses to the iotlb structures.

    Fixes: fbda9f145927 ("vhost: translate incoming log address to GPA")
    Cc: stable

    Signed-off-by: Adrian Moreno <amorenoz>
    Reviewed-by: Maxime Coquelin <maxime.coquelin>

Comment 3 Pei Zhang 2020-02-28 05:59:33 UTC
Verified with openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64:

All migration test cases get PASS. All OVS related cases from Virt get PASS.

Testcase: live_migration_nonrt_server_2Q_1G_iommu_ovs
=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss
0 1Mpps 223 18466 0 513906
1 1Mpps 218 16841 0 502137
2 1Mpps 249 18042 0 558323
3 1Mpps 201 16663 0 467533
Max 1Mpps 249 18466 0 558323
Min 1Mpps 201 16663 0 467533
Mean 1Mpps 222 17503 0 510474
Median 1Mpps 220 17441 0 508021
Stdev 0 19.87 887.27 0.0 37482.18

Testcase: live_migration_nonrt_server_1Q_2M_iommu_ovs
=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss
0 1Mpps 177 14526 0 424732
1 1Mpps 158 14862 0 391369
2 1Mpps 195 14693 0 463311
3 1Mpps 157 14069 0 387091
Max 1Mpps 195 14862 0 463311
Min 1Mpps 157 14069 0 387091
Mean 1Mpps 171 14537 0 416625
Median 1Mpps 167 14609 0 408050
Stdev 0 18.03 341.13 0.0 35380.92

Testcase: live_migration_nonrt_server_1Q_1G_iommu_ovs
=======================Stream Rate: 1Mpps=========================
No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss
0 1Mpps 161 16710 0 394751
1 1Mpps 162 17723 0 392494
2 1Mpps 176 16436 0 421343
3 1Mpps 159 16883 0 404126
Max 1Mpps 176 17723 0 421343
Min 1Mpps 159 16436 0 392494
Mean 1Mpps 164 16938 0 403178
Median 1Mpps 161 16796 0 399438
Stdev 0 7.75 554.75 0.0 13115.23

Testcase: nfv_acceptance_nonrt_server_2Q_1G_iommu
Packets_loss Frame_Size Run_No Throughput Avg_Throughput
0 64 0 20.833854 20.833854

Testcase: vhostuser_hotplug_nonrt_server_iommu
Packets_loss Frame_Size Run_No Throughput Avg_Throughput
0 64 0 21.127253 21.127253
0 64 0 21.307320 21.30732

Testcase: vhostuser_reconnect_nonrt_iommu_qemu
Packets_loss Frame_Size Run_No Throughput Avg_Throughput
0 64 0 21.127277 21.127277
0 64 0 21.307322 21.307322
0 64 0 21.127260 21.12726


Versions:
4.18.0-184.el8.x86_64
tuned-2.13.0-5.el8.noarch
dpdk-19.11-4.el8.x86_64
openvswitch-selinux-extra-policy-1.0-19.el8fdp.noarch
openvswitch2.13-2.13.0-0.20200117git8ae6a5f.el8fdp.1.x86_64
python3-libvirt-6.0.0-1.module+el8.2.0+5453+31b2b136.x86_64
qemu-kvm-4.2.0-12.module+el8.2.0+5858+afd073bc.x86_64

So this bug has been fixed very well. Move to 'VERIFIED'.

Comment 6 errata-xmlrpc 2020-03-10 09:36:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:0747


Note You need to log in before you can comment on or make changes to this bug.