Hide Forgot
Description of problem: When doing live migration with ovs+dpdk, "info migrate" shows wrong 'total time' and 'downtime' info. For example, the migration finished in 20 seconds, however, "total time: 11005066 milliseconds downtime: 0 milliseconds" is wrongly showed. Version-Release number of selected component (if applicable): qemu-kvm-rhev-2.6.0-28.el7.x86_64 libvirt-2.0.0-10.el7.x86_64 3.10.0-514.el7.x86_64 (dpdk and ovs is upstream) dpdk-16.07.zip openvswitch-2.6.0.tar.gz How reproducible: Sometimes, about 50% Steps to Reproduce: 1. Start ovs-dpdk in host1 and host2 # ovs-vsctl show ad29bee3-1cc7-4cc8-893e-206cd3add181 Bridge "ovsbr1" Port "vhost-user2" Interface "vhost-user2" type: dpdkvhostuser Port "dpdk1" Interface "dpdk1" type: dpdk Port "ovsbr1" Interface "ovsbr1" type: internal Bridge "ovsbr0" Port "vhost-user1" Interface "vhost-user1" type: dpdkvhostuser Port "dpdk0" Interface "dpdk0" type: dpdk Port "ovsbr0" Interface "ovsbr0" type: internal 2. Boot guest with vhostuser in host1 <interface type='vhostuser'> <mac address='18:66:da:e6:02:02'/> <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user1' mode='client'/> <model type='virtio'/> <driver name='vhost'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/> </interface> <interface type='vhostuser'> <mac address='18:66:da:e6:02:03'/> <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user2' mode='client'/> <model type='virtio'/> <driver name='vhost'/> <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/> </interface> 3. Migrate guest from host1 to host2 # virsh migrate --live $guest_name qemu+ssh://$des_ip/system 4. Check migration info, occasionally wrong migration is showed. (In this example, the migration is finished in 1 minutes, but it shows 102 minutes) # virsh qemu-monitor-command $guest_name --hmp "info migrate" capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off Migration status: completed total time: 6160317 milliseconds downtime: 0 milliseconds setup: 36 milliseconds transferred ram: 578347 kbytes throughput: 938.65 mbps remaining ram: 0 kbytes total ram: 8389896 kbytes duplicate: 1957490 pages skipped: 0 pages normal: 140012 pages normal bytes: 560048 kbytes dirty sync count: 3 Actual results: "info migrate" shows wrong info. Expected results: "info migrate" should show correct time info. Additional info: 1. I need to highlight this, ovs and dpdk are upstream version. So QE is not sure which component this bug should be, please change it if it's not qemu bug. 2. Both rhel7.3 and rhel7.3 kvm-rt hit this issue.
Additional info(continued): 3. Before migration completed, the "total time" and "expected downtime" shows correct value. Put it here in case it's useful. (1)Before "completed", migration info: capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off Migration status: active total time: 8027 milliseconds expected downtime: 300 milliseconds setup: 61 milliseconds transferred ram: 748896 kbytes throughput: 930.34 mbps remaining ram: 33072 kbytes total ram: 16778504 kbytes duplicate: 4010879 pages skipped: 0 pages normal: 178063 pages normal bytes: 712252 kbytes dirty sync count: 2 dirty pages rate: 326 pages (2)"completed", migration info capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off Migration status: completed total time: 22758195 milliseconds downtime: 0 milliseconds setup: 61 milliseconds transferred ram: 778901 kbytes throughput: 940.61 mbps remaining ram: 0 kbytes total ram: 16778504 kbytes duplicate: 4011672 pages skipped: 0 pages normal: 185548 pages normal bytes: 742192 kbytes dirty sync count: 4
Could you retest with qemu-kvm-rhev-2.10.0?
(In reply to Laurent Vivier from comment #3) > Could you retest with qemu-kvm-rhev-2.10.0? The issue has gone with latest qemu. Versions: qemu-kvm-rhev-2.10.0-6.el7.x86_64 3.10.0-789.el7.x86_64 libvirt-3.9.0-2.el7.x86_64 openvswitch-2.8.0-4.el7fdb.x86_64 After 120 ping-pong migrations(240 migration runs), all "downtime" values and "total time" values look good. Close this bug as "CURRENTRELEASE". Thanks, Pei