Bug 1983694

Summary: Migration hangs if vm is shutdown during live migration [rhel-8.4.0.z]
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: RHEL Program Management Team <pgm-rhel-tools>
Component: libvirtAssignee: Jiri Denemark <jdenemar>
Status: CLOSED ERRATA QA Contact: Fangge Jin <fjin>
Severity: high Docs Contact:
Priority: high    
Version: 8.4CC: fjin, jdenemar, lmen, mzamazal, virt-maint, xuzhang, ymankad
Target Milestone: rcKeywords: Regression, Triaged, ZStream
Target Release: 8.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-7.0.0-14.3.el8 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1949869 Environment:
Last Closed: 2021-08-31 08:07:47 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1949869    
Bug Blocks: 1966121    

Comment 5 Fangge Jin 2021-07-21 09:52:20 UTC
Test with libvirt-7.0.0-14.3

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Scenario 1: poweroff inside vm during live migration.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

I got 3 different kinds result with virsh:
1) It returned "Input/output error"
[root@rhel8-4 ~]# virsh migrate avocado-vt-vm1 qemu+ssh://10.0.151.215/system --live --verbose --p2p --persistent --migrateuri tcp://10.0.151.215 --bandwidth 2
Migration: [  1 %]error: End of file while reading data: : Input/output error

2) It returned nothing
[root@rhel8-4 ~]# virsh migrate avocado-vt-vm1 qemu+ssh://10.0.151.215/system --live --verbose --p2p --persistent --migrateuri tcp://10.0.151.215 --bandwidth 2
Migration: [  1 %]
[root@rhel8-4 ~]# 

3) It returned "domain is not running" which is expected
[root@rhel8-4 ~]# virsh migrate avocado-vt-vm1 qemu+ssh://10.0.151.215/system --live --verbose --p2p --persistent --migrateuri tcp://10.0.151.215 --bandwidth 2
Migration: [  3 %]error: operation failed: domain is not running


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Scenario 2: destroy src vm during live migration.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
[root@rhel8-4 ~]#  virsh migrate avocado-vt-vm1 qemu+ssh://10.0.151.215/system --live --verbose --p2p --persistent --migrateuri tcp://10.0.151.215 --bandwidth 2
Migration: [  0 %]error: operation failed: domain is not running


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Scenario 3: kill src qemu-kvm process during live migration.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#  virsh migrate avocado-vt-vm1 qemu+ssh://10.0.151.215/system --live --verbose --p2p --persistent --migrateuri tcp://10.0.151.215 --bandwidth 2
Migration: [  0 %]error: operation failed: domain is not running


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Scenario 4: destroy dest vm during live migration.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#  virsh migrate avocado-vt-vm1 qemu+ssh://10.0.151.215/system --live --verbose --p2p --persistent --migrateuri tcp://10.0.151.215 --bandwidth 2
Migration: [  0 %]error: End of file while reading data: : Input/output error

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~Scenario 5: kill dest qemu-kvm process during live migration.~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
#  virsh migrate avocado-vt-vm1 qemu+ssh://10.0.151.215/system --live --verbose --p2p --persistent --migrateuri tcp://10.0.151.215 --bandwidth 2
Migration: [  0 %]error: operation failed: domain is no longer running

Comment 6 Jiri Denemark 2021-07-21 12:57:02 UTC
The different behavior can be partially caused by --verbose because it results
in virDomainGetJobInfo (which calls query-migrate QMP command) changing the
interactions and timing between EOF callback and the thread controlling the
migration. Another reason (when the progress says 0%) might caused by killing
QEMU too early before actual migration starts.

Anyway, scenario 1, case 2 is strange and deserves some investigation (as a
separate bug, because migration did not get stuck and thus this BZ is fixed)
in case you are able to reproduce it and provide debug logs from both the
source libvirtd and virsh.

Comment 7 Fangge Jin 2021-07-22 05:49:04 UTC
I can't reproduce scenario 1, case 2 now. I will file a bug if I can reproduce it in the future.

Comment 9 errata-xmlrpc 2021-08-31 08:07:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:3340