Bug 876829

Summary: create external checkpoint snapshot will change the guest pmsuspended state and guest hang forever
Product: Red Hat Enterprise Linux 7 Reporter: Huang Wenlong <whuang>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: cwei, dyuan, eblake, mzhan, rbalakri, shyu
Target Milestone: rcKeywords: TestOnly, Upstream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-03-05 07:19:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Huang Wenlong 2012-11-15 05:11:22 UTC
Description of problem:
create external snapshot will change the guest pmsuspended state and
guest hang forever

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-0.12.1.2-2.331.el6.x86_64
libvirt-0.10.2-8.el6.x86_64
seabios-0.6.1.2-25.el6.x86_64


How reproducible:


Steps to Reproduce:
1.
add these in the guest xml then restart the guest
...
<pm>
<suspend-to-mem enabled='yes'/>
<suspend-to-disk enabled='yes'/>
</pm>
...

<channel type='unix'>
<source mode='bind' path='/var/lib/libvirt/qemu/demo2.agent'/>
<target type='virtio' name='org.qemu.guest_agent.0'/>
<address type='virtio-serial' controller='0' bus='0' port='1'/>
</channel>
...

2. install the qemu-guest-agent in the guest then
#service qemu-ga start


3. in the host
#s3 the guest
[root@intel-q9400-4-2 ~]# virsh dompmsuspend test --target mem
Domain test successfully suspended

#check guest state
[root@intel-q9400-4-2 ~]# virsh list
Id Name State
----------------------------------------------------
8 test pmsuspended

# do external checkpoint snapshot
[root@intel-q9400-4-2 ~]# virsh snapshot-create-as test ex-s4 --diskspec
vda --memspec /tmp/ex-s4
Domain snapshot ex-s4 created

#guest state is changed to running
[root@intel-q9400-4-2 ~]# virsh list
Id Name State
----------------------------------------------------
8 test running

and guest will hang forever with running state



Actual results:
as step

Expected results:
guest state should not change and guest do not hang

Additional info:

Comment 2 Dave Allan 2012-12-07 03:11:54 UTC
What's hanging, the VM?  It sounds to me like it's in a running state but its processors are stopped or something like that?

Comment 3 Eric Blake 2012-12-22 03:37:10 UTC
I reproduced the problem - 'virsh list' says the guest is running, but qemu thinks otherwise:
# virsh qemu-monitor-command dom '{"execute":"query-status"}'
{"return":{"status":"post-migrate","singlestep":false,"running":false},"id":"libvirt-79"}
which matches a state of a paused guest.

It is fairly easy to work around - merely run: 'virsh suspend dom' to get libvirt to match qemu state, then 'virsh resume dom' to resume the guest again.  But yes, libvirt should be getting the state transition correct on its own; I'll look into patching that.

There's no way to do migration-to-file with pmsuspended (S3) state, without help from qemu that won't be happening in 6.4.  Right now, the mere act of migration will wake up a guest out of S3, but we have a choice of whether to wake it up into paused state or into running state.  I inspected the external memory file, and it is defaulting to the paused state.

Comment 5 Eric Blake 2013-01-23 23:29:03 UTC
Upstream patch proposed:
https://www.redhat.com/archives/libvir-list/2013-January/msg01744.html

Comment 6 Eric Blake 2013-01-25 00:14:46 UTC
In POST, since rebasing will pick up this upstream patch:

commit 339bdd99a17eb1420cc5cadf27c36a9637d86f10
Author: Eric Blake <eblake>
Date:   Tue Jan 8 21:54:45 2013 -0700

    snapshot: fix state after external snapshot of S3 domain
    
    https://bugzilla.redhat.com/show_bug.cgi?id=876829 complains that
    if a guest is put into S3 state (such as via virsh dompmsuspend)
    and then an external snapshot is taken, qemu forcefully transitions
    the domain to paused, but libvirt doesn't reflect that change
    internally.  Thus, a user has to use 'virsh suspend' to get libvirt
    back in sync with qemu state, and if the user doesn't know this
    trick, then the guest appears hung.
    
    * src/qemu/qemu_driver.c (qemuDomainSnapshotCreateActiveExternal):
    Track fact that qemu wakes up a suspended domain on migration.

Comment 9 Eric Blake 2013-06-05 22:38:54 UTC
http://post-office.corp.redhat.com/archives/rhvirt-patches/2013-June/msg00027.html if backporting rather than rebasing

Comment 10 Jiri Denemark 2013-06-11 09:56:04 UTC
We decided not to rebase libvirt in RHEL 6.5 to avoid stability issues
we faced in 6.4. This bug has already been fixed upstream but it is
considered unsuitable for backporting to RHEL 6.5 because at least one
of the following conditions is met:

- this bug requires new API(s), which we cannot introduce without
  rebasing libvirt
- the patches required to address this bug are complex or invasive
  causing the backport to be too risky
- this bug is not important enough to justify backporting non-trivial
  patches for it

Thus I'm pushing this bug to RHEL 6.6 (and setting Upstream keyword to
indicate we have patches upstream) for now. If you don't agree with
this resolution, please, give us reasons which you think are strong
enough for us to reevaluate the decision not to backport patches for
this bug.

Comment 11 Eric Blake 2013-07-09 14:05:16 UTC
I'm trying to determine if this bug is related to bug 928762, in which case we may want to pull it back into 6.5.

Comment 12 Eric Blake 2013-07-09 14:05:58 UTC
(In reply to Eric Blake from comment #11)
> I'm trying to determine if this bug is related to bug 928762, in which case
> we may want to pull it back into 6.5.

Typo: meant bug 928672

Comment 16 Jiri Denemark 2014-04-04 21:37:11 UTC
This bug was not selected to be addressed in Red Hat Enterprise Linux 6. We will look at it again within the Red Hat Enterprise Linux 7 product.

Comment 17 Shanzhi Yu 2014-09-01 09:41:09 UTC
On RHEL7 with latest libvirt-1.2.7-2.el7.x86_64,creating snapshot for guest which is in pmsuspend status is not supported currently.

# virsh list 
 Id    Name                           State
----------------------------------------------------
 50    rhel7-qcow2                    pmsuspended

# virsh snapshot-create-as rhel7-qcow2 ss --diskspec vda --memspec /tmp/ss
error: Operation not supported: qemu doesn't support taking snapshots of PMSUSPENDED guests

Is it reasonable to forbid create external disk-only snapshot for guest in pmsuspend status?

# virsh snapshot-create-as rhel7-qcow2 ss2  --disk-only 
error: Operation not supported: qemu doesn't support taking snapshots of PMSUSPENDED guests

Comment 18 Shanzhi Yu 2014-12-09 10:20:21 UTC
Since creating snapshot when guest is in pmsuspend status is forbidden for consistency, so I would change this bug to VERIFIED status.

Comment 20 errata-xmlrpc 2015-03-05 07:19:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-0323.html