Bug 1450680
Summary: | Migrating guest with vhost-user 2 queues and packets flow over dpdk+openvswitch fails: guest hang, and qemu hang or crash | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | Pei Zhang <pezhang> | ||||
Component: | openvswitch | Assignee: | Open vSwitch development team <ovs-team> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Pei Zhang <pezhang> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 7.4 | CC: | aconole, aglotov, aguetta, ailan, atragler, chayang, dgilbert, drjones, ealcaniz, eglynn, fbaudin, fherrman, fleitner, jmaxwell, jraju, jsuchane, juzhang, knoel, ktraynor, marcandre.lureau, marjones, maxime.coquelin, mleitner, mschuppe, pablo.iranzo, pezhang, rlondhe, sdubroca, skramaja, smykhail, sputhenp, tredaelli, victork, virt-maint, zshi | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | openvswitch-2.6.1-28.git20180130.el7ost | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2021-07-15 17:27:09 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | 1553812 | ||||||
Bug Blocks: | 1473046 | ||||||
Attachments: |
|
Description
Pei Zhang
2017-05-14 15:19:33 UTC
When testing with PVP, hit same issue. Steps: 1. In src and des host, boot testpmd testpmd -l 0,2,4,6,8 \ --socket-mem=1024 -n 4 \ --vdev 'net_vhost0,iface=/tmp/vhost-user0' \ --vdev 'net_vhost1,iface=/tmp/vhost-user1' -- \ --portmask=3F --disable-hw-vlan -i --rxq=1 --txq=1 \ --nb-cores=4 --forward-mode=io testpmd> set portlist 0,2,1,3 testpmd> start 2. In src host, boot VM /usr/libexec/qemu-kvm \ -name guest=rhel7.4_nonrt \ -cpu host \ -m 8G \ -smp 6,sockets=1,cores=6,threads=1 \ -object memory-backend-file,id=ram-node0,prealloc=yes,mem-path=/dev/hugepages,share=yes,size=8G,host-nodes=0,policy=bind \ -numa node,nodeid=0,cpus=0-5,memdev=ram-node0 \ -drive file=/mnt/nfv/rhel7.4_nonrt.qcow2,format=qcow2,if=none,id=drive-virtio-disk0,cache=none,aio=threads \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x7,drive=drive-virtio-disk0,id=virtio-disk0 \ -netdev tap,id=hostnet0,vhost=on \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=18:66:da:5f:dd:01 \ -chardev socket,id=charnet1,path=/tmp/vhost-user0 \ -netdev vhost-user,chardev=charnet1,id=hostnet1,queues=2 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=18:66:da:5f:dd:02,mq=on \ -chardev socket,id=charnet2,path=/tmp/vhost-user1 \ -netdev vhost-user,chardev=charnet2,id=hostnet2,queues=2 \ -device virtio-net-pci,netdev=hostnet2,id=net2,mac=18:66:da:5f:dd:03,mq=on \ -msg timestamp=on \ -monitor stdio \ -vnc :2 \ Step 3~6 same as Description. Best Regards, Pei Hi Pei, I'll take a look into this. Sent the fix to DPDK upstream - http://dpdk.org/ml/archives/dev/2017-December/083900.html No changes in QEMU required. *** Bug 1527532 has been marked as a duplicate of this bug. *** This looks to be a dup of Bug 1528229 . Can you confirm? Just checked upstream and this patch was accepted onto DPDK 17.11 stable branch but it is *not* on DPDK 16.11 stable branch. Can that be rectified? I would prefer it to be upstreamed in DPDK 16.11 stable branch before we backport to our OVS2.6 (DPDK16.11). *** Bug 1543740 has been marked as a duplicate of this bug. *** Created attachment 1408673 [details] XML of VM ==update== Summary: Do live migration with vhost-user 2 queues, both ovs2.6 and ovs2.9 support very well now. So this bug has been fixed well. (1) With openvswitch-2.6.1-22.git20180130.el7ost.x86_64 ===========Stream Rate: 1Mpps=========== No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 158 19090 18 11778215 1 1Mpps 141 20118 15 9109284 2 1Mpps 185 20272 18 11893682 3 1Mpps 155 15177 18 554391 4 1Mpps 155 20650 15 14811809 5 1Mpps 142 23900 18 20734173 6 1Mpps 157 21933 15 20217899 7 1Mpps 153 20842 15 11342950 8 1Mpps 150 22799 15 13016815 9 1Mpps 155 22830 15 20582389 (2)With openvswitch-2.9.0-1.el7fdb.x86_64 ===========Stream Rate: 1Mpps=========== No Stream_Rate Downtime Totaltime Ping_Loss moongen_Loss 0 1Mpps 148 24328 17 27680612 1 1Mpps 155 19165 15 13763045 2 1Mpps 140 17258 17 4130227 3 1Mpps 157 19193 16 12237228 4 1Mpps 155 19613 16 10173339 5 1Mpps 143 20357 15 17095979 6 1Mpps 159 20391 18 10285979 7 1Mpps 152 20559 18 9290060 8 1Mpps 146 29477 17 16083082 9 1Mpps 154 19353 16 8995992 More details: 1. Versions besides above 2 openvswitch: kernel-3.10.0-855.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.1.x86_64 libvirt-3.9.0-14.el7.x86_64 tuned-2.9.0-1.el7.noarch 2. During above 2 testings, we hit high packets loss issue which bug [1] is tracking it now. [1]Bug 1552465 - High TRex packets loss during live migration over ovs+dpdk+vhost-user 3. With above 2 testings, ovs acts as vhost-user client mode (Testing with dpdkvhostuserclient ports like below). # ovs-vsctl show 2dfd9e80-233e-44d3-9e39-e9288b1d63f5 Bridge "ovsbr1" Port "ovsbr1" Interface "ovsbr1" type: internal Port "dpdk2" Interface "dpdk2" type: dpdk options: {dpdk-devargs="0000:06:00.0", n_rxq="2", n_txq="2"} Port "vhost-user2" Interface "vhost-user2" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser2.sock"} Bridge "ovsbr0" Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser1.sock"} Port "vhost-user0" Interface "vhost-user0" type: dpdkvhostuserclient options: {vhost-server-path="/tmp/vhostuser0.sock"} Port "ovsbr0" Interface "ovsbr0" type: internal Port "dpdk0" Interface "dpdk0" type: dpdk options: {dpdk-devargs="0000:04:00.0", n_rxq="2", n_txq="2"} Port "dpdk1" Interface "dpdk1" type: dpdk options: {dpdk-devargs="0000:04:00.1", n_rxq="2", n_txq="2"} 4. We test without vIOMMU in these runs. VM xml is attached to this comment. ==Update== Versions: 3.10.0-862.el7.x86_64 qemu-kvm-rhev-2.10.0-21.el7_5.2.x86_64 libvirt-3.9.0-14.el7.x86_64 tuned-2.9.0-1.el7.noarch openvswitch-2.6.1-28.git20180130.el7ost.x86_64 vhost-user 2 queues live migration works well like below: =======================Stream Rate: 1Mpps========================= No Stream_Rate Downtime Totaltime Ping_Loss trex_Loss 0 1Mpps 113 64270 18 40950642.0 1 1Mpps 149 20399 17 8149062.0 2 1Mpps 154 20149 15 10759365.0 3 1Mpps 145 23425 15 13075563.0 4 1Mpps 146 15720 15 831627.0 5 1Mpps 155 20596 15 18343823.0 6 1Mpps 135 64234 14 54869708.0 7 1Mpps 155 20359 16 6423787.0 8 1Mpps 159 17722 15 3210414.0 9 1Mpps 151 16915 15 599545.0 <------------------------Summary------------------------> Max 1Mpps 159 64270 18 54869708 Min 1Mpps 113 15720 14 599545 Mean 1Mpps 146 28378 15 15721353 Median 1Mpps 150 20379 15 9454213 Stdev 0 13.5 19031.67 1.18 18130056.5 (In reply to Pei Zhang from comment #71) > (In reply to Aaron Conole from comment #70) > > Given comment #68 can we close this? > > Hi Aaron, QE has closed this bug as 'VERIFIED'. Closed for good now. (this was showing up on RHEL queries for "zstream?" flags) |