Bug 1970635

Summary: Live VM migration from B->A takes much longer time than migration from A->B
Product: Red Hat Enterprise Linux Fast Datapath Reporter: Ehsan Elahi <eelahi>
Component: ovn2.13Assignee: OVN Team <ovnteam>
Status: NEW --- QA Contact: Ehsan Elahi <eelahi>
Severity: unspecified Docs Contact:
Priority: medium    
Version: FDP 21.ECC: ctrautma, dceara, eelahi, jiji, mdean, mmichels, ralongi
Target Milestone: ---Flags: eelahi: needinfo-
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Ehsan Elahi 2021-06-10 21:14:02 UTC
VM live migration usually takes less than 100ms when migrate from a physical machine A to B. However reverse migration (B->A) of the same VM takes considerably large time (>4000ms).
As a test case, I tried to migrate a VM 10 times from A->B and B->A. Here are the results:

Time taken in migration A->B 1: 97.833
Time taken in migration B->A 1: 4476.161
Time taken in migration A->B 2: 98.730
Time taken in migration B->A 2: 4395.235
Time taken in migration A->B 3: 94.262
Time taken in migration B->A 3: 4407.216
Time taken in migration A->B 4: 90.297
Time taken in migration B->A 4: 4387.979
Time taken in migration A->B 5: 128.000
Time taken in migration B->A 5: 4397.930
Time taken in migration A->B 6: 111.580
Time taken in migration B->A 6: 4400.057
Time taken in migration A->B 7: 90.636
Time taken in migration B->A 7: 4367.282
Time taken in migration A->B 8: 79.717
Time taken in migration B->A 8: 4457.997
Time taken in migration A->B 9: 81.274
Time taken in migration B->A 9: 4391.982
Time taken in migration A->B 10: 94.684
Time taken in migration B->A 10: 4403.516

Here is the reproducer.

####################################
#     Some necessary functions:    #
####################################

ovn_create_vm()
{
        local hv=$1
        local vm=$2

        local guest=hv${hv}_vm${vm}
        local mac="00:de:ad:0${hv}:0${vm}:00"
        local ip=$("192.168.122.$((vm+10+hv*100))")

        echo
        echo "$FUNCNAME($guest)"
        echo mac=$mac
        echo ip=$ip

        # guest image
        rm -rf /home/test/images/$guest.qcow2
        cp -a /var/lib/libvirt/images/rhel7.8.qcow2 /home/test/images/$guest.qcow2

        # define default vnet for internet connection
        virsh net-list | grep -qe "^ default" || { virsh net-define /usr/share/libvirt/networks/default.xml; virsh net-start default; }

        # set IP for eth0
        virsh net-update default add ip-dhcp-host "<host mac='$mac' ip='$ip' />" --live --config

        virt-install \
                --name $guest \
                --vcpus=2 \
                --ram=2048 \
                --disk path=/home/test/images/${guest}.qcow2,device=disk,bus=virtio,format=qcow2,cache=none \
                --network bridge=virbr0,model=virtio,mac=$mac \
                --boot hd \
                --accelerate \
                --force \
                --graphics vnc,listen=0.0.0.0 \
                --os-type=linux \
                --cpu=SandyBridge-IBRS \
                --noautoconsole
		# when migrate, the cpu model should be the same and be supported by host
		# use SandyBridge-IBRS that is supported by most latest intel cpu

        sleep 5
}

# create a virtual network interface for a vm
ovn_create_vnet()
{
	local hv=$1
	local vm=$2
	local vn=$3
	local iface_id=$4

  local guest=hv${hv}_vm${vm}
  local mac="00:de:ad:0${hv}:0${vm}:0${vn}"
	local vnet=hv${hv}_vm${vm}_vn${vn}

        #stop NetworkManager before attach
        vmsh run_cmd $guest "systemctl stop NetworkManager;iptables -F;ip6tables -F;setenforce 0;systemctl stop firewalld"

	if [ -n "$iface_id" ]
	then
		cat <<- EOF > /home/test/vnets/${vnet}.xml
			<interface type='bridge'>
				<target dev='${vnet}'/>
				<mac address='${mac}'/>
				<source bridge='br-int'/>
				<virtualport type='openvswitch'>
					<parameters interfaceid='${iface_id}'/>
				</virtualport>
				<model type='virtio'/>
			</interface>
		EOF
	else
		cat <<- EOF > /home/test/vnets/${vnet}.xml
			<interface type='bridge'>
				<target dev='${vnet}'/>
				<mac address='${mac}'/>
				<source bridge='br-int'/>
				<virtualport type='openvswitch'/>
				<model type='virtio'/>
			</interface>
		EOF
	fi

	echo "virsh attach-device ${guest} /home/test/vnets/${vnet}.xml"
	virsh attach-device ${guest} /home/test/vnets/${vnet}.xml
	[ -z "$iface_id" ] && ovs-vsctl set interface ${vnet} external-ids:iface-id=${vnet}
}
# start dhclient 4/6 on all vnets(eth1..eth99) in guest
ovn_dhclient_ethx()
{
	local hv=$1
	local vm=$2
	local num_of_vnets=$3

	set -o nounset
	set +o nounset

	local result=0

	local guest=hv${hv}_vm${vm}
	local cmd=''

	cmd="pkill dhclient;sleep 2;dhclient -v \$(ip link | grep -B1 \"00:de:ad:0${hv}:0${vm}:01\" | sed -n 's/[[:digit:]]*: \(.*\):.*/\1/p' | xargs)"
	echo $cmd
	vmsh run_cmd $guest "$cmd" || ((result += 1))
  vmsh run_cmd $guest "netserver"

  sleep 10

	vmsh run_cmd $guest "ifconfig"

	return $result
}
####################################
#        On physical machine A     #
####################################
grep "/home/test/images" /etc/exports || echo "/home/test/images *.redhat.com(rw,no_root_squash,sync)" > /etc/exports
systemctl restart nfs-server

sync_set client $FUNCNAME	#  1: Send signal to machine B to go ahead
sync_wait client $FUNCNAME # 2: Block here and Wait for signal from machine B

local uuid_vnet_1_0_1=$(uuidgen)
local uuid_vnet_1_1_1=$(uuidgen)

local dhcp_options=$(ovn-nbctl create DHCP_Options cidr=172.16.1.0/24 \
  options="\"server_id\"=\"172.16.1.254\" \"server_mac\"=\"00:00:00:00:01:00\" \
  \"lease_time\"=\"$((36000 + RANDOM % 3600))\" \"router\"=\"172.16.1.254\"")

ovn-nbctl ls-add ls1

ovn-nbctl lsp-add ls1 ${uuid_vnet_1_0_1} -- lsp-set-dhcpv4-options ${uuid_vnet_1_0_1} $dhcp_options
ovn-nbctl lsp-set-addresses ${uuid_vnet_1_0_1} "00:de:ad:01:00:01 172.16.1.101"

ovn-nbctl lsp-add ls1 ${uuid_vnet_1_1_1} -- lsp-set-dhcpv4-options ${uuid_vnet_1_1_1} $dhcp_options
ovn-nbctl lsp-set-addresses ${uuid_vnet_1_1_1} "00:de:ad:01:01:01 172.16.1.111"

ovn_create_vm 1 0    # a function to create vm. (defined above)
ovn_create_vm 1 1

sleep 90

ovn_create_vnet 1 0 1 ${uuid_vnet_1_0_1} # a function (defined above) to create vnet. (defined above)
ovn_create_vnet 1 1 1 ${uuid_vnet_1_1_1}

sleep 10

ovn-nbctl show
ovn-sbctl show

ovn_dhclient_ethx 1 0 1  # start dhclient on all vnets(eth1..eth99) in guest
ovn_dhclient_ethx 1 1 1

sync_wait client $FUNCNAME  # 3: Block here and Wait for signal from machine B

# To avoid guest not getting dhcp ip address. This should be a workaround
vmsh run_cmd hv1_vm0 "pkill dhclient; sleep 5; dhclient"

vmsh run_cmd hv1_vm0 "ping 172.16.1.111 -c10" || ((result += 1))

# use --unsafe because "error: Unsafe migration: Migration without shared storage is unsafe"
virsh migrate --unsafe --live --undefinesource --persistent hv1_vm1 qemu+ssh://$CLIENTS/system || ((result += 1))
echo "virsh migrate result=$result"

vmsh run_cmd hv1_vm0 "ping 172.16.1.111 -c10" || ((result += 1))

sync_set client $FUNCNAME  # 4: Send signal to machine B to go ahead

sync_wait client $FUNCNAME  # 5: Block here and Wait for signal from machine B

vmsh run_cmd hv1_vm0 "ping 172.16.1.111 -c10" || ((result += 1))

##### 10 times to and from migrations
for (( i=1; i<=10; i++ ))
do
  echo "Migration count $i"
  # Migration A->B
  sshpass -p redhat ssh -o UserKnownHostsFile=/dev/null -o 'StrictHostKeyChecking=no' root.122.110 "timeout -s SIGINT 10s ping 172.16.1.111 -i 0 -f" | tee downtime.log &
  virsh migrate --unsafe --live --undefinesource --persistent hv1_vm1 qemu+ssh://$CLIENTS/system || ((result += 1))
  echo "virsh migrate result=$result"

  wait
  sleep 5

  ovn-sbctl show
  virsh list --all

  local tx=$(grep "packets transmitted" downtime.log | awk '{print $1}')
  local rx=$(grep "received" downtime.log | awk '{print $4}')
  local rtt_max=$(grep rtt downtime.log | awk '{print $4}' | awk -F '/' '{print $3}')
  local unit=$(grep rtt downtime.log | awk '{print $5}')
  echo "-----------------------------------------"
  (($(bc <<< "$rtt_max > 500"))) && { echo "FAIL: too long migration time: ${rtt_max}${unit} for A to B"; ((result += 1)); } || echo "migration time(A->B): ${rtt_max}${unit}"
  echo "-----------------------------------------"

  echo "Time taken in migration A->B $i: $rtt_max" >>migrationTime.log

  # Migration B->A process started on the physical machine B. This below command is synced with [1] on the physical machine B to calculate the migration time.
  sshpass -p redhat ssh -o UserKnownHostsFile=/dev/null -o 'StrictHostKeyChecking=no' root.122.110 "timeout -s SIGINT 10s ping 172.16.1.111 -i 0 -f" | tee downtime.log &
  sync_set client $FUNCNAME   # 6:and 8:  ### Send signal to machine B to go ahead. This signal is same for "client wait" number 6 and 8
  sync_wait client $FUNCNAME  # 7: Block here and Wait for signal from machine B

  wait
  sleep 5

  ovn-sbctl show

  local tx=$(grep "packets transmitted" downtime.log | awk '{print $1}')
  local rx=$(grep "received" downtime.log | awk '{print $4}')
  local rtt_max=$(grep rtt downtime.log | awk '{print $4}' | awk -F '/' '{print $3}')
  local unit=$(grep rtt downtime.log | awk '{print $5}')
  echo "-----------------------------------------"
  (($(bc <<< "$rtt_max > 500"))) && { echo "FAIL: too long migration time: ${rtt_max}${unit} for B to A"; ((result += 1)); } || echo "migration time(B->A): ${rtt_max}${unit}"
  echo "-----------------------------------------"
  echo "Time taken in migration B->A $i: $rtt_max" >>migrationTime.log
done

sync_set client $FUNCNAME   # 8: Send signal to machine B to go ahead
sync_wait client $FUNCNAME  # 9: Block here and Wait for signal from machine B

cat migrationTime.log


####################################
#        On physical machine B     #
####################################
sync_wait server $FUNCNAME #  1: Block here and wait for signal from machine A
mount | grep '/home/test/images' >/dev/null && umount /home/test/images
rm -rf /home/tests/images/*
mount $SERVERS:/home/test/images /home/test/images
sync_set server $FUNCNAME #  2: Send signal to machine A to go ahead

virsh net-list | grep -qe "^ default" || { virsh net-define /usr/share/libvirt/networks/default.xml; virsh net-start default; }

sync_set server $FUNCNAME   # 3: Send signal to machine A to go ahead
sync_wait server $FUNCNAME  # 4: Block here and wait for signal from machine A
# [1]
virsh migrate --live --persistent --undefinesource hv1_vm1 qemu+ssh://$SERVERS/system || ((result += 1))
echo "virsh migrate result=$result"
sleep 5
virsh list --all

sync_set server $FUNCNAME    # 5: Send signal to machine A to go ahead
sync_wait server $FUNCNAME   # 6: Block here and wait for signal from machine A

##### 10 times from and to migrations
echo "-----------------------------------------EELAHI"
for (( i=1; i<=10; i++ ))
do
  echo "Migration count $i"
  # B->A 2
  virsh migrate --live --persistent --undefinesource hv1_vm1 qemu+ssh://$SERVERS/system || ((result += 1))
  echo "virsh migrate result=$result"
  sleep 5
  virsh list --all

  sync_set server $FUNCNAME    # 7: Send signal to machine A to go ahead
  sync_wait server $FUNCNAME   # 8: Block here and wait for signal from machine A
done

echo "-----------------------------------------/EELAHI"

umount /home/test/images

sync_set server $FUNCNAME   # 9: Block here and wait for signal from machine A

Comment 1 Dr. David Alan Gilbert 2021-06-14 09:32:19 UTC
It feels like there's a test problem somewhere, because a live migration under 100ms seems very unlikely.
Ca we have the /var/log/libvirt/qemu/hv1_vm1.log off both machines please.

Also, why are you using --unsafe?

My guess is that the VM isn't really running on A

Comment 2 Ehsan Elahi 2021-06-15 09:06:38 UTC
virsh list --all shows the presence of vm before migration and absence of vm after migration on machines A:
virsh list before migration
 Id   Name       State
--------------------------
 1    hv1_vm00   running
 2    hv1_vm01   running

virsh list after migration
 Id   Name       State
--------------------------
 1    hv1_vm00   running

The use of --unsafe option is to avoid the error: "error: Unsafe migration: Migration without shared storage is unsafe". 
The dump of /var/log/libvirt/qemu/hv1_vm1.log on both machines A and B is pasted here. It shows the multiple migrations on both sides. 

################# Trimmed /var/log/libvirt/qemu/hv1_vm01.log on machine A #########################
2021-06-14 13:14:46.312+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-30.rhts.eng.pek2.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=hv1_vm01,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-hv1_vm01/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on \
-m 2048 \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=38,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \
-blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=40,id=hostnet0,vhost=on,vhostfd=41 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 0.0.0.0:1 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
2021-06-14 13:17:57.804+0000: initiating migration
2021-06-14 13:18:01.913+0000: shutting down, reason=migrated
2021-06-14T13:18:01.915943Z qemu-kvm: terminating on signal 15 from pid 24757 (/usr/sbin/libvirtd)
2021-06-14 13:18:17.481+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-30.rhts.eng.pek2.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=hv1_vm01,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-hv1_vm01/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \
-m 2048 \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=38,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \
-blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=40,id=hostnet0,vhost=on,vhostfd=41 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \
-netdev tap,fd=42,id=hostnet1,vhost=on,vhostfd=43 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 0.0.0.0:1 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \
-incoming defer \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/1 (label charserial0)
2021-06-14 13:18:42.691+0000: initiating migration
2021-06-14 13:18:46.832+0000: shutting down, reason=migrated
2021-06-14T13:18:46.834450Z qemu-kvm: terminating on signal 15 from pid 24757 (/usr/sbin/libvirtd)
2021-06-14 13:19:00.075+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-30.rhts.eng.pek2.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=hv1_vm01,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-hv1_vm01/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \
-m 2048 \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=38,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \
-blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=40,id=hostnet0,vhost=on,vhostfd=41 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \
-netdev tap,fd=42,id=hostnet1,vhost=on,vhostfd=43 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 0.0.0.0:1 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \
-incoming defer \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/2 (label charserial0)
2021-06-14 13:19:17.760+0000: initiating migration
2021-06-14 13:19:21.897+0000: shutting down, reason=migrated
2021-06-14T13:19:21.900105Z qemu-kvm: terminating on signal 15 from pid 24757 (/usr/sbin/libvirtd)



####################### Trimmed /var/log/libvirt/qemu/hv1_vm01.log on machine B ########################
2021-06-14 13:17:57.463+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-81.rhts.eng.pek2.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=hv1_vm01,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-hv1_vm01/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \
-m 2048 \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=37,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \
-blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \
-netdev tap,fd=41,id=hostnet1,vhost=on,vhostfd=42 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 0.0.0.0:0 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \
-incoming defer \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/0 (label charserial0)
2021-06-14 13:18:17.855+0000: initiating migration
2021-06-14 13:18:21.954+0000: shutting down, reason=migrated
2021-06-14T13:18:21.956030Z qemu-kvm: terminating on signal 15 from pid 71507 (/usr/sbin/libvirtd)
2021-06-14 13:18:42.387+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-81.rhts.eng.pek2.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=hv1_vm01,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-hv1_vm01/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \
-m 2048 \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=37,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \
-blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \
-netdev tap,fd=41,id=hostnet1,vhost=on,vhostfd=42 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 0.0.0.0:0 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \
-incoming defer \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/0 (label charserial0)
2021-06-14 13:19:00.411+0000: initiating migration
2021-06-14 13:19:04.594+0000: shutting down, reason=migrated
2021-06-14T13:19:04.595754Z qemu-kvm: terminating on signal 15 from pid 71507 (/usr/sbin/libvirtd)
2021-06-14 13:19:17.424+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-81.rhts.eng.pek2.redhat.com
LC_ALL=C \
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \
HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01 \
XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.local/share \
XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.cache \
XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.config \
QEMU_AUDIO_DRV=none \
/usr/libexec/qemu-kvm \
-name guest=hv1_vm01,debug-threads=on \
-S \
-object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-hv1_vm01/master-key.aes \
-machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \
-cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \
-m 2048 \
-overcommit mem-lock=off \
-smp 2,sockets=2,cores=1,threads=1 \
-uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \
-no-user-config \
-nodefaults \
-chardev socket,id=charmonitor,fd=37,server,nowait \
-mon chardev=charmonitor,id=monitor,mode=control \
-rtc base=utc,driftfix=slew \
-global kvm-pit.lost_tick_policy=delay \
-no-hpet \
-no-shutdown \
-global PIIX4_PM.disable_s3=1 \
-global PIIX4_PM.disable_s4=1 \
-boot strict=on \
-device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \
-device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \
-device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \
-device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \
-blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \
-blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \
-netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \
-device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \
-netdev tap,fd=41,id=hostnet1,vhost=on,vhostfd=42 \
-device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \
-chardev pty,id=charserial0 \
-device isa-serial,chardev=charserial0,id=serial0 \
-device usb-tablet,id=input0,bus=usb.0,port=1 \
-vnc 0.0.0.0:0 \
-device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \
-incoming defer \
-device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \
-sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \
-msg timestamp=on
char device redirected to /dev/pts/0 (label charserial0)
2021-06-14 13:19:35.422+0000: initiating migration
2021-06-14 13:19:39.528+0000: shutting down, reason=migrated
2021-06-14T13:19:39.529302Z qemu-kvm: terminating on signal 15 from pid 71507 (/usr/sbin/libvirtd)

Comment 3 Dr. David Alan Gilbert 2021-06-15 09:44:53 UTC
OK, so you really shouldn't need the 'unsafe' -if your storage is actually shared you shouldn't be getting that error
if you set it up properly.
If it's not shared, hmm you should be able to do storage migration.

Anyway, if we grep that log for 'migrat' we get:

2021-06-14 13:17:57.804+0000: initiating migration
2021-06-14 13:18:01.913+0000: shutting down, reason=migrated
2021-06-14 13:18:42.691+0000: initiating migration
2021-06-14 13:18:46.832+0000: shutting down, reason=migrated
2021-06-14 13:19:17.760+0000: initiating migration
2021-06-14 13:19:21.897+0000: shutting down, reason=migrated

2021-06-14 13:18:17.855+0000: initiating migration
2021-06-14 13:18:21.954+0000: shutting down, reason=migrated
2021-06-14 13:19:00.411+0000: initiating migration
2021-06-14 13:19:04.594+0000: shutting down, reason=migrated
2021-06-14 13:19:35.422+0000: initiating migration
2021-06-14 13:19:39.528+0000: shutting down, reason=migrated

so the actual qemu level migration is taking about 4s in both directions; which for a 2G guest would be fair on a 10G interface.

Looking at your script your actual 'time' is based on the ping timeout; so for some reason your networking is disappearing for the whole of migration in one
direction; that shouldn't happen.

This needs debugging at the network level to find out where the pings are going - you should still see the guest responding to pings during the bulk of the migration
except for a short 'downtime' near the end.

Comment 4 Dumitru Ceara 2021-06-18 07:41:29 UTC
This looks like a rather simple OVN configuration, just one logical
switch with two logical ports.  I wouldn't expect networking to
disappear for a long while unless the removing the old OVS interface
from hv1 or adding the new OVS interface on hv2 (when migrating from
hv1 to hv2) takes long.  It would be great to have the following info:

- ovn-controller logs from both hypervisors
- ovs-vswitchd logs from both hypervisors
- OVS conf.db (should be in /etc/openvswitch/conf.db) from both
  hypervisors

It would also help to have the timestamps when the migration is
triggered in order to compare with the OVSDB logs in conf.db.

Thanks,
Dumitru

Comment 5 Ehsan Elahi 2021-06-22 08:10:57 UTC
I edited the reproducer and just made two migrations instead of 10 in the loop. The use of "time" command at the start of "virsh migrate" shows that on the average migration completes in 5 seconds. This is also evident from the start and end time stamps below. However the network downtime is less than 100ms while migrating from A->B and around 5000ms while migrating the same vm from B->A. 
Here is the required info: 

############# Machine A ##################
Migration started: 1624338464422711 (epoch timestamp in milliseconds)
Migration completed: 1624338469480986

# cat /var/log/ovn/ovn-controller.log
https://drive.google.com/file/d/1tpqHuEY-fdfOccdu1KRgBIRYNlE6pz6C/view?usp=sharing

# cat /var/log/openvswitch/ovs-vswitchd.log
https://drive.google.com/file/d/1JHWOvyoc39lRVarZv-_fag3ssAGZBqJG/view?usp=sharing

# cat /etc/openvswitch/conf.db
https://drive.google.com/file/d/1rNJ5UHKpu4v2k1Ox0nkSeXbVH6i7ugGt/view?usp=sharing


############# Machine B ##################
Migration started: 1624338481984488
Migration completed: 1624338487137728

# cat /var/log/ovn/ovn-controller.log
https://drive.google.com/file/d/1S8fD60jz5n_ry_Ps7QVN2w-Zyg354vh_/view?usp=sharing

# cat /var/log/openvswitch/ovs-vswitchd.log
https://drive.google.com/file/d/1XrbzuWTDz5dHpsnLYFRE3fNbTDMHtpPW/view?usp=sharing

# cat /etc/openvswitch/conf.db
https://drive.google.com/file/d/1iBu0_7gA_Xdr02ysyRyHVNE9teZ6YBqJ/view?usp=sharing

Comment 6 Dumitru Ceara 2021-06-22 15:33:30 UTC
As discussed offline, reassigning to Fabrizio for further analysis.