VM live migration usually takes less than 100ms when migrate from a physical machine A to B. However reverse migration (B->A) of the same VM takes considerably large time (>4000ms). As a test case, I tried to migrate a VM 10 times from A->B and B->A. Here are the results: Time taken in migration A->B 1: 97.833 Time taken in migration B->A 1: 4476.161 Time taken in migration A->B 2: 98.730 Time taken in migration B->A 2: 4395.235 Time taken in migration A->B 3: 94.262 Time taken in migration B->A 3: 4407.216 Time taken in migration A->B 4: 90.297 Time taken in migration B->A 4: 4387.979 Time taken in migration A->B 5: 128.000 Time taken in migration B->A 5: 4397.930 Time taken in migration A->B 6: 111.580 Time taken in migration B->A 6: 4400.057 Time taken in migration A->B 7: 90.636 Time taken in migration B->A 7: 4367.282 Time taken in migration A->B 8: 79.717 Time taken in migration B->A 8: 4457.997 Time taken in migration A->B 9: 81.274 Time taken in migration B->A 9: 4391.982 Time taken in migration A->B 10: 94.684 Time taken in migration B->A 10: 4403.516 Here is the reproducer. #################################### # Some necessary functions: # #################################### ovn_create_vm() { local hv=$1 local vm=$2 local guest=hv${hv}_vm${vm} local mac="00:de:ad:0${hv}:0${vm}:00" local ip=$("192.168.122.$((vm+10+hv*100))") echo echo "$FUNCNAME($guest)" echo mac=$mac echo ip=$ip # guest image rm -rf /home/test/images/$guest.qcow2 cp -a /var/lib/libvirt/images/rhel7.8.qcow2 /home/test/images/$guest.qcow2 # define default vnet for internet connection virsh net-list | grep -qe "^ default" || { virsh net-define /usr/share/libvirt/networks/default.xml; virsh net-start default; } # set IP for eth0 virsh net-update default add ip-dhcp-host "<host mac='$mac' ip='$ip' />" --live --config virt-install \ --name $guest \ --vcpus=2 \ --ram=2048 \ --disk path=/home/test/images/${guest}.qcow2,device=disk,bus=virtio,format=qcow2,cache=none \ --network bridge=virbr0,model=virtio,mac=$mac \ --boot hd \ --accelerate \ --force \ --graphics vnc,listen=0.0.0.0 \ --os-type=linux \ --cpu=SandyBridge-IBRS \ --noautoconsole # when migrate, the cpu model should be the same and be supported by host # use SandyBridge-IBRS that is supported by most latest intel cpu sleep 5 } # create a virtual network interface for a vm ovn_create_vnet() { local hv=$1 local vm=$2 local vn=$3 local iface_id=$4 local guest=hv${hv}_vm${vm} local mac="00:de:ad:0${hv}:0${vm}:0${vn}" local vnet=hv${hv}_vm${vm}_vn${vn} #stop NetworkManager before attach vmsh run_cmd $guest "systemctl stop NetworkManager;iptables -F;ip6tables -F;setenforce 0;systemctl stop firewalld" if [ -n "$iface_id" ] then cat <<- EOF > /home/test/vnets/${vnet}.xml <interface type='bridge'> <target dev='${vnet}'/> <mac address='${mac}'/> <source bridge='br-int'/> <virtualport type='openvswitch'> <parameters interfaceid='${iface_id}'/> </virtualport> <model type='virtio'/> </interface> EOF else cat <<- EOF > /home/test/vnets/${vnet}.xml <interface type='bridge'> <target dev='${vnet}'/> <mac address='${mac}'/> <source bridge='br-int'/> <virtualport type='openvswitch'/> <model type='virtio'/> </interface> EOF fi echo "virsh attach-device ${guest} /home/test/vnets/${vnet}.xml" virsh attach-device ${guest} /home/test/vnets/${vnet}.xml [ -z "$iface_id" ] && ovs-vsctl set interface ${vnet} external-ids:iface-id=${vnet} } # start dhclient 4/6 on all vnets(eth1..eth99) in guest ovn_dhclient_ethx() { local hv=$1 local vm=$2 local num_of_vnets=$3 set -o nounset set +o nounset local result=0 local guest=hv${hv}_vm${vm} local cmd='' cmd="pkill dhclient;sleep 2;dhclient -v \$(ip link | grep -B1 \"00:de:ad:0${hv}:0${vm}:01\" | sed -n 's/[[:digit:]]*: \(.*\):.*/\1/p' | xargs)" echo $cmd vmsh run_cmd $guest "$cmd" || ((result += 1)) vmsh run_cmd $guest "netserver" sleep 10 vmsh run_cmd $guest "ifconfig" return $result } #################################### # On physical machine A # #################################### grep "/home/test/images" /etc/exports || echo "/home/test/images *.redhat.com(rw,no_root_squash,sync)" > /etc/exports systemctl restart nfs-server sync_set client $FUNCNAME # 1: Send signal to machine B to go ahead sync_wait client $FUNCNAME # 2: Block here and Wait for signal from machine B local uuid_vnet_1_0_1=$(uuidgen) local uuid_vnet_1_1_1=$(uuidgen) local dhcp_options=$(ovn-nbctl create DHCP_Options cidr=172.16.1.0/24 \ options="\"server_id\"=\"172.16.1.254\" \"server_mac\"=\"00:00:00:00:01:00\" \ \"lease_time\"=\"$((36000 + RANDOM % 3600))\" \"router\"=\"172.16.1.254\"") ovn-nbctl ls-add ls1 ovn-nbctl lsp-add ls1 ${uuid_vnet_1_0_1} -- lsp-set-dhcpv4-options ${uuid_vnet_1_0_1} $dhcp_options ovn-nbctl lsp-set-addresses ${uuid_vnet_1_0_1} "00:de:ad:01:00:01 172.16.1.101" ovn-nbctl lsp-add ls1 ${uuid_vnet_1_1_1} -- lsp-set-dhcpv4-options ${uuid_vnet_1_1_1} $dhcp_options ovn-nbctl lsp-set-addresses ${uuid_vnet_1_1_1} "00:de:ad:01:01:01 172.16.1.111" ovn_create_vm 1 0 # a function to create vm. (defined above) ovn_create_vm 1 1 sleep 90 ovn_create_vnet 1 0 1 ${uuid_vnet_1_0_1} # a function (defined above) to create vnet. (defined above) ovn_create_vnet 1 1 1 ${uuid_vnet_1_1_1} sleep 10 ovn-nbctl show ovn-sbctl show ovn_dhclient_ethx 1 0 1 # start dhclient on all vnets(eth1..eth99) in guest ovn_dhclient_ethx 1 1 1 sync_wait client $FUNCNAME # 3: Block here and Wait for signal from machine B # To avoid guest not getting dhcp ip address. This should be a workaround vmsh run_cmd hv1_vm0 "pkill dhclient; sleep 5; dhclient" vmsh run_cmd hv1_vm0 "ping 172.16.1.111 -c10" || ((result += 1)) # use --unsafe because "error: Unsafe migration: Migration without shared storage is unsafe" virsh migrate --unsafe --live --undefinesource --persistent hv1_vm1 qemu+ssh://$CLIENTS/system || ((result += 1)) echo "virsh migrate result=$result" vmsh run_cmd hv1_vm0 "ping 172.16.1.111 -c10" || ((result += 1)) sync_set client $FUNCNAME # 4: Send signal to machine B to go ahead sync_wait client $FUNCNAME # 5: Block here and Wait for signal from machine B vmsh run_cmd hv1_vm0 "ping 172.16.1.111 -c10" || ((result += 1)) ##### 10 times to and from migrations for (( i=1; i<=10; i++ )) do echo "Migration count $i" # Migration A->B sshpass -p redhat ssh -o UserKnownHostsFile=/dev/null -o 'StrictHostKeyChecking=no' root.122.110 "timeout -s SIGINT 10s ping 172.16.1.111 -i 0 -f" | tee downtime.log & virsh migrate --unsafe --live --undefinesource --persistent hv1_vm1 qemu+ssh://$CLIENTS/system || ((result += 1)) echo "virsh migrate result=$result" wait sleep 5 ovn-sbctl show virsh list --all local tx=$(grep "packets transmitted" downtime.log | awk '{print $1}') local rx=$(grep "received" downtime.log | awk '{print $4}') local rtt_max=$(grep rtt downtime.log | awk '{print $4}' | awk -F '/' '{print $3}') local unit=$(grep rtt downtime.log | awk '{print $5}') echo "-----------------------------------------" (($(bc <<< "$rtt_max > 500"))) && { echo "FAIL: too long migration time: ${rtt_max}${unit} for A to B"; ((result += 1)); } || echo "migration time(A->B): ${rtt_max}${unit}" echo "-----------------------------------------" echo "Time taken in migration A->B $i: $rtt_max" >>migrationTime.log # Migration B->A process started on the physical machine B. This below command is synced with [1] on the physical machine B to calculate the migration time. sshpass -p redhat ssh -o UserKnownHostsFile=/dev/null -o 'StrictHostKeyChecking=no' root.122.110 "timeout -s SIGINT 10s ping 172.16.1.111 -i 0 -f" | tee downtime.log & sync_set client $FUNCNAME # 6:and 8: ### Send signal to machine B to go ahead. This signal is same for "client wait" number 6 and 8 sync_wait client $FUNCNAME # 7: Block here and Wait for signal from machine B wait sleep 5 ovn-sbctl show local tx=$(grep "packets transmitted" downtime.log | awk '{print $1}') local rx=$(grep "received" downtime.log | awk '{print $4}') local rtt_max=$(grep rtt downtime.log | awk '{print $4}' | awk -F '/' '{print $3}') local unit=$(grep rtt downtime.log | awk '{print $5}') echo "-----------------------------------------" (($(bc <<< "$rtt_max > 500"))) && { echo "FAIL: too long migration time: ${rtt_max}${unit} for B to A"; ((result += 1)); } || echo "migration time(B->A): ${rtt_max}${unit}" echo "-----------------------------------------" echo "Time taken in migration B->A $i: $rtt_max" >>migrationTime.log done sync_set client $FUNCNAME # 8: Send signal to machine B to go ahead sync_wait client $FUNCNAME # 9: Block here and Wait for signal from machine B cat migrationTime.log #################################### # On physical machine B # #################################### sync_wait server $FUNCNAME # 1: Block here and wait for signal from machine A mount | grep '/home/test/images' >/dev/null && umount /home/test/images rm -rf /home/tests/images/* mount $SERVERS:/home/test/images /home/test/images sync_set server $FUNCNAME # 2: Send signal to machine A to go ahead virsh net-list | grep -qe "^ default" || { virsh net-define /usr/share/libvirt/networks/default.xml; virsh net-start default; } sync_set server $FUNCNAME # 3: Send signal to machine A to go ahead sync_wait server $FUNCNAME # 4: Block here and wait for signal from machine A # [1] virsh migrate --live --persistent --undefinesource hv1_vm1 qemu+ssh://$SERVERS/system || ((result += 1)) echo "virsh migrate result=$result" sleep 5 virsh list --all sync_set server $FUNCNAME # 5: Send signal to machine A to go ahead sync_wait server $FUNCNAME # 6: Block here and wait for signal from machine A ##### 10 times from and to migrations echo "-----------------------------------------EELAHI" for (( i=1; i<=10; i++ )) do echo "Migration count $i" # B->A 2 virsh migrate --live --persistent --undefinesource hv1_vm1 qemu+ssh://$SERVERS/system || ((result += 1)) echo "virsh migrate result=$result" sleep 5 virsh list --all sync_set server $FUNCNAME # 7: Send signal to machine A to go ahead sync_wait server $FUNCNAME # 8: Block here and wait for signal from machine A done echo "-----------------------------------------/EELAHI" umount /home/test/images sync_set server $FUNCNAME # 9: Block here and wait for signal from machine A
It feels like there's a test problem somewhere, because a live migration under 100ms seems very unlikely. Ca we have the /var/log/libvirt/qemu/hv1_vm1.log off both machines please. Also, why are you using --unsafe? My guess is that the VM isn't really running on A
virsh list --all shows the presence of vm before migration and absence of vm after migration on machines A: virsh list before migration Id Name State -------------------------- 1 hv1_vm00 running 2 hv1_vm01 running virsh list after migration Id Name State -------------------------- 1 hv1_vm00 running The use of --unsafe option is to avoid the error: "error: Unsafe migration: Migration without shared storage is unsafe". The dump of /var/log/libvirt/qemu/hv1_vm1.log on both machines A and B is pasted here. It shows the multiple migrations on both sides. ################# Trimmed /var/log/libvirt/qemu/hv1_vm01.log on machine A ######################### 2021-06-14 13:14:46.312+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-30.rhts.eng.pek2.redhat.com LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.config \ QEMU_AUDIO_DRV=none \ /usr/libexec/qemu-kvm \ -name guest=hv1_vm01,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-hv1_vm01/master-key.aes \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on \ -m 2048 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=38,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \ -blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,fd=40,id=hostnet0,vhost=on,vhostfd=41 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 0.0.0.0:1 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/1 (label charserial0) 2021-06-14 13:17:57.804+0000: initiating migration 2021-06-14 13:18:01.913+0000: shutting down, reason=migrated 2021-06-14T13:18:01.915943Z qemu-kvm: terminating on signal 15 from pid 24757 (/usr/sbin/libvirtd) 2021-06-14 13:18:17.481+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-30.rhts.eng.pek2.redhat.com LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.config \ QEMU_AUDIO_DRV=none \ /usr/libexec/qemu-kvm \ -name guest=hv1_vm01,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-hv1_vm01/master-key.aes \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \ -m 2048 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=38,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \ -blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,fd=40,id=hostnet0,vhost=on,vhostfd=41 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \ -netdev tap,fd=42,id=hostnet1,vhost=on,vhostfd=43 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 0.0.0.0:1 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \ -incoming defer \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/1 (label charserial0) 2021-06-14 13:18:42.691+0000: initiating migration 2021-06-14 13:18:46.832+0000: shutting down, reason=migrated 2021-06-14T13:18:46.834450Z qemu-kvm: terminating on signal 15 from pid 24757 (/usr/sbin/libvirtd) 2021-06-14 13:19:00.075+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-30.rhts.eng.pek2.redhat.com LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-4-hv1_vm01/.config \ QEMU_AUDIO_DRV=none \ /usr/libexec/qemu-kvm \ -name guest=hv1_vm01,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-4-hv1_vm01/master-key.aes \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \ -m 2048 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=38,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \ -blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,fd=40,id=hostnet0,vhost=on,vhostfd=41 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \ -netdev tap,fd=42,id=hostnet1,vhost=on,vhostfd=43 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 0.0.0.0:1 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \ -incoming defer \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/2 (label charserial0) 2021-06-14 13:19:17.760+0000: initiating migration 2021-06-14 13:19:21.897+0000: shutting down, reason=migrated 2021-06-14T13:19:21.900105Z qemu-kvm: terminating on signal 15 from pid 24757 (/usr/sbin/libvirtd) ####################### Trimmed /var/log/libvirt/qemu/hv1_vm01.log on machine B ######################## 2021-06-14 13:17:57.463+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-81.rhts.eng.pek2.redhat.com LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-1-hv1_vm01/.config \ QEMU_AUDIO_DRV=none \ /usr/libexec/qemu-kvm \ -name guest=hv1_vm01,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-1-hv1_vm01/master-key.aes \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \ -m 2048 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=37,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \ -blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \ -netdev tap,fd=41,id=hostnet1,vhost=on,vhostfd=42 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 0.0.0.0:0 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \ -incoming defer \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/0 (label charserial0) 2021-06-14 13:18:17.855+0000: initiating migration 2021-06-14 13:18:21.954+0000: shutting down, reason=migrated 2021-06-14T13:18:21.956030Z qemu-kvm: terminating on signal 15 from pid 71507 (/usr/sbin/libvirtd) 2021-06-14 13:18:42.387+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-81.rhts.eng.pek2.redhat.com LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-2-hv1_vm01/.config \ QEMU_AUDIO_DRV=none \ /usr/libexec/qemu-kvm \ -name guest=hv1_vm01,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-hv1_vm01/master-key.aes \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \ -m 2048 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=37,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \ -blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \ -netdev tap,fd=41,id=hostnet1,vhost=on,vhostfd=42 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 0.0.0.0:0 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \ -incoming defer \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/0 (label charserial0) 2021-06-14 13:19:00.411+0000: initiating migration 2021-06-14 13:19:04.594+0000: shutting down, reason=migrated 2021-06-14T13:19:04.595754Z qemu-kvm: terminating on signal 15 from pid 71507 (/usr/sbin/libvirtd) 2021-06-14 13:19:17.424+0000: starting up libvirt version: 6.0.0, package: 28.module+el8.3.0+7827+5e65edd7 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2020-08-26-13:53:35, ), qemu version: 4.2.0qemu-kvm-4.2.0-34.module+el8.3.0+7976+077be4ec, kernel: 4.18.0-240.el8.x86_64, hostname: dell-per740-81.rhts.eng.pek2.redhat.com LC_ALL=C \ PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin \ HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01 \ XDG_DATA_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.local/share \ XDG_CACHE_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.cache \ XDG_CONFIG_HOME=/var/lib/libvirt/qemu/domain-3-hv1_vm01/.config \ QEMU_AUDIO_DRV=none \ /usr/libexec/qemu-kvm \ -name guest=hv1_vm01,debug-threads=on \ -S \ -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-3-hv1_vm01/master-key.aes \ -machine pc-i440fx-rhel7.6.0,accel=kvm,usb=off,dump-guest-core=off \ -cpu SandyBridge-IBRS,ibpb=on,md-clear=on,spec-ctrl=on,ssbd=on,vme=on,hypervisor=on,arat=on,xsaveopt=on \ -m 2048 \ -overcommit mem-lock=off \ -smp 2,sockets=2,cores=1,threads=1 \ -uuid ffc5e670-9a51-4308-b5f0-cc68f9f20c42 \ -no-user-config \ -nodefaults \ -chardev socket,id=charmonitor,fd=37,server,nowait \ -mon chardev=charmonitor,id=monitor,mode=control \ -rtc base=utc,driftfix=slew \ -global kvm-pit.lost_tick_policy=delay \ -no-hpet \ -no-shutdown \ -global PIIX4_PM.disable_s3=1 \ -global PIIX4_PM.disable_s4=1 \ -boot strict=on \ -device ich9-usb-ehci1,id=usb,bus=pci.0,addr=0x4.0x7 \ -device ich9-usb-uhci1,masterbus=usb.0,firstport=0,bus=pci.0,multifunction=on,addr=0x4 \ -device ich9-usb-uhci2,masterbus=usb.0,firstport=2,bus=pci.0,addr=0x4.0x1 \ -device ich9-usb-uhci3,masterbus=usb.0,firstport=4,bus=pci.0,addr=0x4.0x2 \ -blockdev '{"driver":"file","filename":"/home/test/images/hv1_vm01.qcow2","node-name":"libvirt-1-storage","cache":{"direct":true,"no-flush":false},"auto-read-only":true,"discard":"unmap"}' \ -blockdev '{"node-name":"libvirt-1-format","read-only":false,"cache":{"direct":true,"no-flush":false},"driver":"qcow2","file":"libvirt-1-storage","backing":null}' \ -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=libvirt-1-format,id=virtio-disk0,bootindex=1,write-cache=on \ -netdev tap,fd=39,id=hostnet0,vhost=on,vhostfd=40 \ -device virtio-net-pci,netdev=hostnet0,id=net0,mac=00:de:ad:01:01:00,bus=pci.0,addr=0x3 \ -netdev tap,fd=41,id=hostnet1,vhost=on,vhostfd=42 \ -device virtio-net-pci,netdev=hostnet1,id=net1,mac=00:de:ad:01:01:01,bus=pci.0,addr=0x7 \ -chardev pty,id=charserial0 \ -device isa-serial,chardev=charserial0,id=serial0 \ -device usb-tablet,id=input0,bus=usb.0,port=1 \ -vnc 0.0.0.0:0 \ -device qxl-vga,id=video0,ram_size=67108864,vram_size=67108864,vram64_size_mb=0,vgamem_mb=16,max_outputs=1,bus=pci.0,addr=0x2 \ -incoming defer \ -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 \ -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny \ -msg timestamp=on char device redirected to /dev/pts/0 (label charserial0) 2021-06-14 13:19:35.422+0000: initiating migration 2021-06-14 13:19:39.528+0000: shutting down, reason=migrated 2021-06-14T13:19:39.529302Z qemu-kvm: terminating on signal 15 from pid 71507 (/usr/sbin/libvirtd)
OK, so you really shouldn't need the 'unsafe' -if your storage is actually shared you shouldn't be getting that error if you set it up properly. If it's not shared, hmm you should be able to do storage migration. Anyway, if we grep that log for 'migrat' we get: 2021-06-14 13:17:57.804+0000: initiating migration 2021-06-14 13:18:01.913+0000: shutting down, reason=migrated 2021-06-14 13:18:42.691+0000: initiating migration 2021-06-14 13:18:46.832+0000: shutting down, reason=migrated 2021-06-14 13:19:17.760+0000: initiating migration 2021-06-14 13:19:21.897+0000: shutting down, reason=migrated 2021-06-14 13:18:17.855+0000: initiating migration 2021-06-14 13:18:21.954+0000: shutting down, reason=migrated 2021-06-14 13:19:00.411+0000: initiating migration 2021-06-14 13:19:04.594+0000: shutting down, reason=migrated 2021-06-14 13:19:35.422+0000: initiating migration 2021-06-14 13:19:39.528+0000: shutting down, reason=migrated so the actual qemu level migration is taking about 4s in both directions; which for a 2G guest would be fair on a 10G interface. Looking at your script your actual 'time' is based on the ping timeout; so for some reason your networking is disappearing for the whole of migration in one direction; that shouldn't happen. This needs debugging at the network level to find out where the pings are going - you should still see the guest responding to pings during the bulk of the migration except for a short 'downtime' near the end.
This looks like a rather simple OVN configuration, just one logical switch with two logical ports. I wouldn't expect networking to disappear for a long while unless the removing the old OVS interface from hv1 or adding the new OVS interface on hv2 (when migrating from hv1 to hv2) takes long. It would be great to have the following info: - ovn-controller logs from both hypervisors - ovs-vswitchd logs from both hypervisors - OVS conf.db (should be in /etc/openvswitch/conf.db) from both hypervisors It would also help to have the timestamps when the migration is triggered in order to compare with the OVSDB logs in conf.db. Thanks, Dumitru
I edited the reproducer and just made two migrations instead of 10 in the loop. The use of "time" command at the start of "virsh migrate" shows that on the average migration completes in 5 seconds. This is also evident from the start and end time stamps below. However the network downtime is less than 100ms while migrating from A->B and around 5000ms while migrating the same vm from B->A. Here is the required info: ############# Machine A ################## Migration started: 1624338464422711 (epoch timestamp in milliseconds) Migration completed: 1624338469480986 # cat /var/log/ovn/ovn-controller.log https://drive.google.com/file/d/1tpqHuEY-fdfOccdu1KRgBIRYNlE6pz6C/view?usp=sharing # cat /var/log/openvswitch/ovs-vswitchd.log https://drive.google.com/file/d/1JHWOvyoc39lRVarZv-_fag3ssAGZBqJG/view?usp=sharing # cat /etc/openvswitch/conf.db https://drive.google.com/file/d/1rNJ5UHKpu4v2k1Ox0nkSeXbVH6i7ugGt/view?usp=sharing ############# Machine B ################## Migration started: 1624338481984488 Migration completed: 1624338487137728 # cat /var/log/ovn/ovn-controller.log https://drive.google.com/file/d/1S8fD60jz5n_ry_Ps7QVN2w-Zyg354vh_/view?usp=sharing # cat /var/log/openvswitch/ovs-vswitchd.log https://drive.google.com/file/d/1XrbzuWTDz5dHpsnLYFRE3fNbTDMHtpPW/view?usp=sharing # cat /etc/openvswitch/conf.db https://drive.google.com/file/d/1iBu0_7gA_Xdr02ysyRyHVNE9teZ6YBqJ/view?usp=sharing
As discussed offline, reassigning to Fabrizio for further analysis.