Bug 1390483

Summary: "info migrate" shows wrong info when do live migration with ovs-dpdk (ovs 2.6)
Product: Red Hat Enterprise Linux 7 Reporter: Pei Zhang <pezhang>
Component: qemu-kvm-rhevAssignee: Laurent Vivier <lvivier>
Status: CLOSED CURRENTRELEASE QA Contact: Pei Zhang <pezhang>
Severity: low Docs Contact:
Priority: low    
Version: 7.3CC: atelang, chayang, dgilbert, fbaudin, hhuang, juzhang, knoel, michen, pagupta, peterx, pezhang, quintela, virt-maint, xfu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-23 04:43:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1473046    

Description Pei Zhang 2016-11-01 08:22:34 UTC
Description of problem:
When doing live migration with ovs+dpdk, "info migrate" shows wrong 'total time' and 'downtime' info.

For example, the migration finished in 20 seconds, however, "total time: 11005066 milliseconds  downtime: 0 milliseconds" is wrongly showed.


Version-Release number of selected component (if applicable):
qemu-kvm-rhev-2.6.0-28.el7.x86_64
libvirt-2.0.0-10.el7.x86_64
3.10.0-514.el7.x86_64
(dpdk and ovs is upstream)
dpdk-16.07.zip
openvswitch-2.6.0.tar.gz


How reproducible:
Sometimes, about 50%


Steps to Reproduce:
1. Start ovs-dpdk in host1 and host2
# ovs-vsctl show
ad29bee3-1cc7-4cc8-893e-206cd3add181
    Bridge "ovsbr1"
        Port "vhost-user2"
            Interface "vhost-user2"
                type: dpdkvhostuser
        Port "dpdk1"
            Interface "dpdk1"
                type: dpdk
        Port "ovsbr1"
            Interface "ovsbr1"
                type: internal
    Bridge "ovsbr0"
        Port "vhost-user1"
            Interface "vhost-user1"
                type: dpdkvhostuser
        Port "dpdk0"
            Interface "dpdk0"
                type: dpdk
        Port "ovsbr0"
            Interface "ovsbr0"
                type: internal


2. Boot guest with vhostuser in host1
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:02:02'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user1' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x02' function='0x0'/>
    </interface>
    <interface type='vhostuser'>
      <mac address='18:66:da:e6:02:03'/>
      <source type='unix' path='/usr/local/var/run/openvswitch/vhost-user2' mode='client'/>
      <model type='virtio'/>
      <driver name='vhost'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </interface>

3. Migrate guest from host1 to host2
# virsh migrate --live $guest_name qemu+ssh://$des_ip/system

4. Check migration info, occasionally wrong migration is showed. (In this example, the migration is finished in 1 minutes, but it shows 102 minutes)
# virsh qemu-monitor-command $guest_name --hmp "info migrate"
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off 
Migration status: completed
total time: 6160317 milliseconds
downtime: 0 milliseconds
setup: 36 milliseconds
transferred ram: 578347 kbytes
throughput: 938.65 mbps
remaining ram: 0 kbytes
total ram: 8389896 kbytes
duplicate: 1957490 pages
skipped: 0 pages
normal: 140012 pages
normal bytes: 560048 kbytes
dirty sync count: 3


Actual results:
"info migrate" shows wrong info.


Expected results:
"info migrate" should show correct time info.


Additional info:
1. I need to highlight this, ovs and dpdk are upstream version. So QE is not sure which component this bug should be, please change it if it's not qemu bug.

2. Both rhel7.3 and rhel7.3 kvm-rt hit this issue.

Comment 2 Pei Zhang 2016-11-08 07:47:29 UTC
Additional info(continued):
3. Before migration completed, the "total time" and "expected downtime" shows correct value. Put it here in case it's useful.

(1)Before "completed", migration info:
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off 
Migration status: active
total time: 8027 milliseconds
expected downtime: 300 milliseconds
setup: 61 milliseconds
transferred ram: 748896 kbytes
throughput: 930.34 mbps
remaining ram: 33072 kbytes
total ram: 16778504 kbytes
duplicate: 4010879 pages
skipped: 0 pages
normal: 178063 pages
normal bytes: 712252 kbytes
dirty sync count: 2
dirty pages rate: 326 pages

(2)"completed", migration info
capabilities: xbzrle: off rdma-pin-all: off auto-converge: off zero-blocks: off compress: off events: on postcopy-ram: off 
Migration status: completed
total time: 22758195 milliseconds
downtime: 0 milliseconds
setup: 61 milliseconds
transferred ram: 778901 kbytes
throughput: 940.61 mbps
remaining ram: 0 kbytes
total ram: 16778504 kbytes
duplicate: 4011672 pages
skipped: 0 pages
normal: 185548 pages
normal bytes: 742192 kbytes
dirty sync count: 4

Comment 3 Laurent Vivier 2017-11-22 14:24:18 UTC
Could you retest with qemu-kvm-rhev-2.10.0?

Comment 4 Pei Zhang 2017-11-23 04:43:15 UTC
(In reply to Laurent Vivier from comment #3)
> Could you retest with qemu-kvm-rhev-2.10.0?

The issue has gone with latest qemu.

Versions:
qemu-kvm-rhev-2.10.0-6.el7.x86_64
3.10.0-789.el7.x86_64
libvirt-3.9.0-2.el7.x86_64
openvswitch-2.8.0-4.el7fdb.x86_64


After 120 ping-pong migrations(240 migration runs), all "downtime" values and "total time" values look good.  


Close this bug as "CURRENTRELEASE". 



Thanks,
Pei