Bug 1273720

Summary: Paused guest can not be migrated to the target machine
Product: Red Hat Enterprise Linux 7 Reporter: Dan Zheng <dzheng>
Component: qemu-kvm-rhevAssignee: Juan Quintela <quintela>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.2CC: dyuan, fjin, gsun, hannsj_uhl, huding, ipinto, jdenemar, juzhang, mzhan, pezhang, qizhu, quintela, qzhang, rbalakri, virt-maint, xfu, xianwang, zpeng
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-05-03 11:51:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1359843    

Description Dan Zheng 2015-10-21 06:16:36 UTC
Description of problem:
A guest with paused state can not be migrated to target machine. This happened on both X86_64 and PPC. T

Version-Release number of selected component (if applicable):
libvirt-1.2.17-13.el7.ppc64le
qemu-kvm-rhev-2.3.0-31.el7.ppc64le
kernel-3.10.0-324.el7.ppc64le


How reproducible:
100%

Steps to Reproduce:
1.Start a guest using below XML, with --paused option. '--paused' is the key step for this issue. 
 <disk type='network' device='disk'>
      <driver name='qemu' type='raw' cache='none'/>
      <source protocol='iscsi' name='iqn.2015-08.com.virttest:emulated-iscsi.target/0'>
        <host name='2620:52:0:1370:42f2:e9ff:fe5c:2b38' port='3260'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>

# virsh start dzheng-rhel72-20151015-le --paused
# virsh list --all
 Id    Name                           State
----------------------------------------------------
 12    dzheng-rhel72-20151015-le      paused

2. Migrate
# virsh migrate dzheng-rhel72-20151015-le --live --verbose --unsafe qemu+ssh://10.19.112.10:22/system
Migration: [ 89 %]error: operation failed: domain is no longer running

3. Check on target machine and no guest is created and the guest on source machine is still in paused state.

Actual results:
Migration failed. Check the qemu log on target:
2015-10-21 02:46:19.282+0000: 65018: debug : virFileClose:102 : Closed fd 23
2015-10-21 02:46:19.282+0000: 65018: debug : virFileClose:102 : Closed fd 30
2015-10-21 02:46:19.282+0000: 65019: debug : virExec:691 : Run hook 0x3fff8632c390 0x3fffa38dd2b8
2015-10-21 02:46:19.282+0000: 65019: debug : qemuProcessHook:3219 : Obtaining domain lock
2015-10-21 02:46:19.283+0000: 65019: debug : virSecuritySELinuxSetSecuritySocketLabel:2215 : Setting VM dzheng-rhel72-20151015-le socket context system_u:system_r:svirt_t:s0:c375,c441
2015-10-21 02:46:19.283+0000: 65018: debug : virFileClose:102 : Closed fd 3
2015-10-21 02:46:19.283+0000: 65019: debug : virDomainLockProcessStart:179 : plugin=0x3fff58164100 dom=0x3fff94002a50 paused=1 fd=0x3fffa38dccd0
2015-10-21 02:46:19.283+0000: 65019: debug : virDomainLockManagerNew:134 : plugin=0x3fff58164100 dom=0x3fff94002a50 withResources=1
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerPluginGetDriver:281 : plugin=0x3fff58164100
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerNew:305 : driver=0x3fffac438430 type=0 nparams=5 params=0x3fffa38dcaa8 flags=1
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerLogParams:98 :   key=uuid type=uuid value=7bf23827-269d-4497-a561-27fb0152ad06
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerLogParams:91 :   key=name type=string value=dzheng-rhel72-20151015-le
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerLogParams:79 :   key=id type=uint value=4
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerLogParams:79 :   key=pid type=uint value=65019
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerLogParams:94 :   key=uri type=cstring value=qemu:///system
2015-10-21 02:46:19.283+0000: 65019: debug : virDomainLockManagerNew:146 : Adding leases
2015-10-21 02:46:19.283+0000: 65019: debug : virDomainLockManagerNew:151 : Adding disks
2015-10-21 02:46:19.283+0000: 65019: debug : virDomainLockManagerAddImage:90 : Add disk /var/lib/libvirt/images/dzheng-rhel72-20151015-le.qcow2
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerAddResource:332 : lock=0x3fff940052b0 type=0 name=/var/lib/libvirt/images/dzheng-rhel72-20151015-le.qcow2 nparams=0 params=(nil) flags=0
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerAcquire:350 : lock=0x3fff940052b0 state='<null>' flags=3 action=0 fd=0x3fffa38dccd0
2015-10-21 02:46:19.283+0000: 65019: debug : virLockManagerFree:387 : lock=0x3fff940052b0
2015-10-21 02:46:19.283+0000: 65019: info : virObjectUnref:259 : OBJECT_UNREF: obj=0x3fff5810a290
2015-10-21 02:46:19.283+0000: 65019: debug : qemuProcessHook:3260 : Hook complete ret=0
2015-10-21 02:46:19.283+0000: 65019: debug : virExec:693 : Done hook 0
2015-10-21 02:46:19.283+0000: 65019: debug : virExec:700 : Setting child security label to system_u:system_r:svirt_t:s0:c375,c441
2015-10-21 02:46:19.283+0000: 65019: debug : virExec:730 : Setting child uid:gid to 107:107 with caps 0
2015-10-21 02:46:19.283+0000: 65019: debug : virCommandHandshakeChild:432 : Notifying parent for handshake start on 27
2015-10-21 02:46:19.283+0000: 65019: debug : virCommandHandshakeChild:440 : Waiting on parent for handshake complete on 28
2015-10-21 02:46:19.293+0000: 65019: debug : virFileClose:102 : Closed fd 27
2015-10-21 02:46:19.293+0000: 65019: debug : virFileClose:102 : Closed fd 28
2015-10-21 02:46:19.293+0000: 65019: debug : virCommandHandshakeChild:460 : Handshake with parent is done
char device redirected to /dev/pts/2 (label charserial0)
ERROR: invalid runstate transition: 'inmigrate' -> 'postmigrate'
2015-10-21 02:46:20.479+0000: shutting down

4. Try with below XML and --copy-storage-all for migration, the issue happens again. 
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2'/>
      <source file='/var/lib/libvirt/images/dzheng-rhel72-20151015-le.qcow2'/>
      <backingStore/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x04' function='0x0'/>
    </disk>

# virsh migrate dzheng-rhel72-20151015-le --verbose --unsafe --copy-storage-all qemu+ssh://10.19.112.10:22/system
Migration: [100 %]error: internal error: early end of file from monitor: possible problem:
ERROR: invalid runstate transition: 'inmigrate' -> 'postmigrate'



Expected results:
Migration succeed and the guest should be in paused state on the target machine.


Additional info:
1. Try on x86, it looks like successful, however actually fails, since couldn't find the guest in the target host and the guest on source is in shutoff. And there are same error messages in the qemu log on target machine.

# virsh migrate --live --verbose --unsafe rhel72 qemu+ssh://10.66.70.120/system
Migration: [100 %]

# virsh list --all
 Id    Name                           State
----------------------------------------------------
 -     rhel72                         shut off

2. There is no problem if 'virsh suspend' a guest and then migrate.

Comment 1 Dan Zheng 2015-10-21 06:19:22 UTC
Use managedsave also can reproduce this issue.
# virsh start $guest --paused
# virsh managedsave $guest
# virsh start $guest
error: Failed to start domain test
error: internal error: early end of file from monitor: possible problem:
ERROR: invalid runstate transition: 'inmigrate' -> 'prelaunch'

Comment 3 Jiri Denemark 2015-10-21 07:10:20 UTC
Starting a domain with --paused means we run QEMU with -S and never call "cont" QMP command. If we run "cont" followed by "suspend", the bug does not happen.

Comment 4 Juan Quintela 2015-10-21 08:40:24 UTC
This is, at least, philosphically an "invalid" use case.  You could as well just launch the guest on target instead of doing the migration (guest has never run).

But after discussing this with Jiri, I agree that qemu don't export enough information for libvirt to know that guest has never run, so allowing that transition.

Comment 5 Juan Quintela 2016-08-24 15:20:08 UTC
This means that the guest has never been run.  Why we want to migrate the guest instead of launch it on destination again?

Comment 6 Yaniv Kaul 2016-09-01 08:19:24 UTC
*** Bug 1371957 has been marked as a duplicate of this bug. ***

Comment 7 Juan Quintela 2017-05-03 11:45:49 UTC
Fixed upstream and on current release with commit:

commit 98799b0d4be4fb5e3962005448119133a6bf74b2
Author: Paolo Bonzini <pbonzini>
Date:   Mon Feb 15 19:40:04 2016 +0100

    vl: fix migration from prelaunch state
    
    Reproducer is simply to migrate a virtual machine that was started with -S,
    or that was already migrated.
    
    Signed-off-by: Paolo Bonzini <pbonzini>