Red Hat Bugzilla – Bug 876829
create external checkpoint snapshot will change the guest pmsuspended state and guest hang forever
Last modified: 2016-04-26 10:45:10 EDT
Description of problem: create external snapshot will change the guest pmsuspended state and guest hang forever Version-Release number of selected component (if applicable): qemu-kvm-rhev-0.12.1.2-2.331.el6.x86_64 libvirt-0.10.2-8.el6.x86_64 seabios-0.6.1.2-25.el6.x86_64 How reproducible: Steps to Reproduce: 1. add these in the guest xml then restart the guest ... <pm> <suspend-to-mem enabled='yes'/> <suspend-to-disk enabled='yes'/> </pm> ... <channel type='unix'> <source mode='bind' path='/var/lib/libvirt/qemu/demo2.agent'/> <target type='virtio' name='org.qemu.guest_agent.0'/> <address type='virtio-serial' controller='0' bus='0' port='1'/> </channel> ... 2. install the qemu-guest-agent in the guest then #service qemu-ga start 3. in the host #s3 the guest [root@intel-q9400-4-2 ~]# virsh dompmsuspend test --target mem Domain test successfully suspended #check guest state [root@intel-q9400-4-2 ~]# virsh list Id Name State ---------------------------------------------------- 8 test pmsuspended # do external checkpoint snapshot [root@intel-q9400-4-2 ~]# virsh snapshot-create-as test ex-s4 --diskspec vda --memspec /tmp/ex-s4 Domain snapshot ex-s4 created #guest state is changed to running [root@intel-q9400-4-2 ~]# virsh list Id Name State ---------------------------------------------------- 8 test running and guest will hang forever with running state Actual results: as step Expected results: guest state should not change and guest do not hang Additional info:
What's hanging, the VM? It sounds to me like it's in a running state but its processors are stopped or something like that?
I reproduced the problem - 'virsh list' says the guest is running, but qemu thinks otherwise: # virsh qemu-monitor-command dom '{"execute":"query-status"}' {"return":{"status":"post-migrate","singlestep":false,"running":false},"id":"libvirt-79"} which matches a state of a paused guest. It is fairly easy to work around - merely run: 'virsh suspend dom' to get libvirt to match qemu state, then 'virsh resume dom' to resume the guest again. But yes, libvirt should be getting the state transition correct on its own; I'll look into patching that. There's no way to do migration-to-file with pmsuspended (S3) state, without help from qemu that won't be happening in 6.4. Right now, the mere act of migration will wake up a guest out of S3, but we have a choice of whether to wake it up into paused state or into running state. I inspected the external memory file, and it is defaulting to the paused state.
Upstream patch proposed: https://www.redhat.com/archives/libvir-list/2013-January/msg01744.html
In POST, since rebasing will pick up this upstream patch: commit 339bdd99a17eb1420cc5cadf27c36a9637d86f10 Author: Eric Blake <eblake@redhat.com> Date: Tue Jan 8 21:54:45 2013 -0700 snapshot: fix state after external snapshot of S3 domain https://bugzilla.redhat.com/show_bug.cgi?id=876829 complains that if a guest is put into S3 state (such as via virsh dompmsuspend) and then an external snapshot is taken, qemu forcefully transitions the domain to paused, but libvirt doesn't reflect that change internally. Thus, a user has to use 'virsh suspend' to get libvirt back in sync with qemu state, and if the user doesn't know this trick, then the guest appears hung. * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateActiveExternal): Track fact that qemu wakes up a suspended domain on migration.
http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-June/msg00027.html if backporting rather than rebasing
We decided not to rebase libvirt in RHEL 6.5 to avoid stability issues we faced in 6.4. This bug has already been fixed upstream but it is considered unsuitable for backporting to RHEL 6.5 because at least one of the following conditions is met: - this bug requires new API(s), which we cannot introduce without rebasing libvirt - the patches required to address this bug are complex or invasive causing the backport to be too risky - this bug is not important enough to justify backporting non-trivial patches for it Thus I'm pushing this bug to RHEL 6.6 (and setting Upstream keyword to indicate we have patches upstream) for now. If you don't agree with this resolution, please, give us reasons which you think are strong enough for us to reevaluate the decision not to backport patches for this bug.
I'm trying to determine if this bug is related to bug 928762, in which case we may want to pull it back into 6.5.
(In reply to Eric Blake from comment #11) > I'm trying to determine if this bug is related to bug 928762, in which case > we may want to pull it back into 6.5. Typo: meant bug 928672
This bug was not selected to be addressed in Red Hat Enterprise Linux 6. We will look at it again within the Red Hat Enterprise Linux 7 product.
On RHEL7 with latest libvirt-1.2.7-2.el7.x86_64,creating snapshot for guest which is in pmsuspend status is not supported currently. # virsh list Id Name State ---------------------------------------------------- 50 rhel7-qcow2 pmsuspended # virsh snapshot-create-as rhel7-qcow2 ss --diskspec vda --memspec /tmp/ss error: Operation not supported: qemu doesn't support taking snapshots of PMSUSPENDED guests Is it reasonable to forbid create external disk-only snapshot for guest in pmsuspend status? # virsh snapshot-create-as rhel7-qcow2 ss2 --disk-only error: Operation not supported: qemu doesn't support taking snapshots of PMSUSPENDED guests
Since creating snapshot when guest is in pmsuspend status is forbidden for consistency, so I would change this bug to VERIFIED status.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2015-0323.html