Bug 733762

Summary: problems with libvirt start and snapshot resume with paused domains
Product: Red Hat Enterprise Linux 6 Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 6.2CC: dyuan, gren, mzhan, nzhang, rwu, veillard, whuang, yupzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.4-8.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 11:27:33 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 638510, 674537, 737010, 740508    

Description Eric Blake 2011-08-26 18:22:14 UTC
Description of problem:
Libvirt is not honoring the VIR_DOMAIN_START_PAUSED flag when resuming a managed save domain.  Additionally, when libvirt starts a paused domain, it is failing to issue a VIR_DOMAIN_EVENT_SUSPENDED to reflect the domain state, which may confuse management apps that are using events to track domain transitions.  Similar event problems occur when reverting to a snapshot.

Version-Release number of selected component (if applicable):
libvirt-0.9.4-6.el6

How reproducible:
100%

Steps to Reproduce:
1. virsh start dom --paused
2. virsh resume dom
3. virsh managedsave dom
4. virsh start dom --paused
5. virsh pause dom
6. virsh snapshot-create-as dom name
7. virsh destroy dom
8. virsh snapshot-revert dom name
  
Actual results:
1. VIR_DOMAIN_EVENT_STARTED issued, but no VIR_DOMAIN_EVENT_SUSPENDED
2. VIR_DOMAIN_EVENT_RESUMED issued
3. VIR_DOMAIN_EVENT_STOPPED issued
4. VIR_DOMAIN_EVENT_STARTED issued and domain is running
5. VIR_DOMAIN_EVENT_SUSPENDED issued
6. no event, but snapshot remembers it is paused
7. VIR_DOMAIN_EVENT_STOPPED issued
8. VIR_DOMAIN_EVENT_STARTED and VIR_DOMAIN_EVENT_SUSPENDED both issued, but vcpus had a window where they temporarily ran and differ from the state of the snapshot

Expected results:
1. both VIR_DOMAIN_EVENT_STARTED and VIR_DOMAIN_EVENT_SUSPENDED issued
2. VIR_DOMAIN_EVENT_RESUMED issued
3. VIR_DOMAIN_EVENT_STOPPED issued
4. VIR_DOMAIN_EVENT_STARTED and VIR_DOMAIN_EVENT_PAUSED issued, domain is paused
5. error, since domain is already paused
6. no event, but snapshot remembers it is paused
7. VIR_DOMAIN_EVENT_STOPPED issued
8. VIR_DOMAIN_EVENT_STARTED and VIR_DOMAIN_EVENT_SUSPENDED both issued, domain is paused and in same state as at the time of the snapshot


Additional info:

Comment 1 Eric Blake 2011-08-26 18:22:56 UTC
Getting this fixed is a prereq to bug 638510 support for live snapshots via the
snapshot_blkdev qemu monitor command.

Comment 2 Eric Blake 2011-08-26 22:28:32 UTC
One other bug to be fixed in this area of code at the same time: Newer qemu does not allow 'qemu -loadvm name' to revert to an inactive internal snapshot - if there is no accompanying vm state in the snapshot name, the attempt is rejected.  To fix that, the revert code needs to use 'qemu-img snapshot -a name'.  Fixing this is necessary before bug 674537 can be tested.

Comment 3 Eric Blake 2011-09-02 19:41:17 UTC
Upstream now has several patches, culminating in this commit, that should fix this bug:

commit 7dc44eb059fc976e7c88091477a981a4a90bf2f5
Author: Eric Blake <eblake>
Date:   Sat Aug 27 13:48:19 2011 -0600

    snapshot: fine-tune qemu snapshot revert states
    
    For a system checkpoint of a running or paused domain, it's fairly
    easy to honor new flags for altering which state to use after the
    revert.  For an inactive snapshot, the revert has to be done while
    there is no qemu process, so do back-to-back transitions; this also
    lets us revert to inactive snapshots even for transient domains.
    
    * src/qemu/qemu_driver.c (qemuDomainRevertToSnapshot): Support new
    flags.

Comment 6 Nan Zhang 2011-09-13 10:29:08 UTC
Tested with libvirt-0.9.4-11.el6.x86_64, move this to VERIFIED.

Terminal 1:
============
# virsh start foo --paused
Domain foo started

# virsh list
 Id Name                 State
----------------------------------
  1 win7                 running
  3 foo                  paused

# virsh resume foo
Domain foo resumed

# virsh managedsave foo
Domain foo state saved by libvirt

# virsh start foo --paused
Domain foo started

# virsh suspend foo
Domain foo suspended

# virsh snapshot-create-as foo snap1
Domain snapshot snap1 created

# virsh destroy foo
Domain foo destroyed

# virsh snapshot-revert foo snap1

# virsh list
 Id Name                 State
----------------------------------
  1 win7                 running
  4 foo                  paused


Terminal 2:
============
# ./event-test 
main:345: Registering domain event cbs
myDomainEventCallback1 EVENT: Domain foo(2) Started Booted
myDomainEventCallback2 EVENT: Domain foo(2) Started Booted
myDomainEventCallback1 EVENT: Domain foo(2) Suspended Paused
myDomainEventCallback2 EVENT: Domain foo(2) Suspended Paused
myDomainEventCallback1 EVENT: Domain foo(2) Resumed Unpaused
myDomainEventCallback2 EVENT: Domain foo(2) Resumed Unpaused
myDomainEventCallback1 EVENT: Domain foo(-1) Stopped Failed
myDomainEventCallback2 EVENT: Domain foo(-1) Stopped Failed
myDomainEventCallback1 EVENT: Domain foo(3) Started Restored
myDomainEventCallback2 EVENT: Domain foo(3) Started Restored
myDomainEventCallback1 EVENT: Domain foo(3) Suspended Paused
myDomainEventCallback2 EVENT: Domain foo(3) Suspended Paused
myDomainEventCallback1 EVENT: Domain foo(-1) Stopped Destroyed
myDomainEventCallback2 EVENT: Domain foo(-1) Stopped Destroyed
myDomainEventCallback1 EVENT: Domain foo(4) Started Snapshot
myDomainEventCallback2 EVENT: Domain foo(4) Started Snapshot
myDomainEventCallback1 EVENT: Domain foo(4) Suspended Snapshot
myDomainEventCallback2 EVENT: Domain foo(4) Suspended Snapshot

Comment 7 Eric Blake 2011-09-13 15:57:32 UTC
The fixes for this bug introduced a regression in bug 737010

Comment 8 Eric Blake 2011-09-19 14:48:35 UTC
*** Bug 739486 has been marked as a duplicate of this bug. ***

Comment 9 Eric Blake 2011-09-22 15:09:26 UTC
Patch wasn't quite complete - see bug 740508 for the remaining fix

Comment 10 errata-xmlrpc 2011-12-06 11:27:33 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html