Bug 674537

Summary: Restarting libvirtd resets current snapshot
Product: Red Hat Enterprise Linux 6 Reporter: Jiri Denemark <jdenemar>
Component: libvirtAssignee: Eric Blake <eblake>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: high    
Version: 6.1CC: dallan, dyuan, eblake, nzhang, syeghiay, veillard, whuang, xen-maint, yupzhang
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-0.9.4-8.el6 Doc Type: Bug Fix
Doc Text:
Cause A number of logic bugs were present in libvirt snapshot (system checkpoint) handling; among these, restarting libvirtd would lose track of the current snapshot, and a change in qemu behavior triggered a latent bug in libvirt's ability to restore certain snapshots. Consequence Snapshots were unreliable and hard to manage without tripping up on limitations, contrary to the documentation. Fix A number of bug fixes and added flags to existing snapshot management APIs, along with better testing of more snapshot scenarios, allowed libvirt to actually provide all the snapshot features that it had previously documented. Result Management applications can use system checkpoint snapshots for better control in rolling back to known stable states of a VM.
Story Points: ---
Clone Of: 662026 Environment:
Last Closed: 2011-12-06 10:54:00 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 733529, 733762    
Bug Blocks: 638510    

Description Jiri Denemark 2011-02-02 11:27:52 UTC
Description of problem:

After restarting libvirtd, it forgets what was the current snapshot of a qemu
domain.

All this can currently be tested on inactive domains only since snapshotting a
running domain doesn't work on RHEL-6 (bug 589076).

Version-Release number of selected component (if applicable):
libvirt-0.8.7-4.el6

How reproducible:
Always

Steps to Reproduce:
# virsh snapshot-current rhel6

# virsh snapshot-create rhel6
Domain snapshot 1291880262 created

# virsh snapshot-list rhel6
 Name                 Creation Time             State
---------------------------------------------------
 1291880262           2010-12-09 02:37:42 -0500 shutoff

# virsh snapshot-current rhel6
<domainsnapshot>
  <name>1291880262</name>
  <state>shutoff</state>
  <creationTime>1291880262</creationTime>
  <domain>
    <uuid>a35ba4e9-5fbb-3fca-9d6d-9e0dab6e3d32</uuid>
  </domain>
</domainsnapshot>

# service libvirtd restart

# virsh snapshot-current rhel6

Actual results:
No snapshot is marked as current

Expected results:
1291880262 should remain current snapshot even after libvirtd restart

Additional info:
Similar thinks happen after setting domain's current snapshot using virsh
snapshot-revert and restarting libvirtd.

Comment 4 Eric Blake 2011-08-05 19:22:53 UTC
I might end up fixing this one as a side effect of implementing disk snapshots on top of the virDomainSnapshotCreateXML command, since disk snapshots have to interact with the hierarchy of checkpoint snapshots.

Comment 5 Eric Blake 2011-08-16 19:28:19 UTC
In fact, I _did_ end up fixing this as a side effect.  Upstream patch still awaiting ACK, but should be trivial to backport once approved:

https://www.redhat.com/archives/libvir-list/2011-August/msg00627.html

snapshot: track current snapshot across restarts

Audit all changes to the qemu vm->current_snapshot, and make them
update the saved xml file for both the previous and the new
snapshot, so that there is always at most one snapshot with
<active>1</active> in the xml, and that snapshot is used as the
current snapshot even across libvirtd restarts.

* src/conf/domain_conf.h (_virDomainSnapshotDef): Alter member
type and name.
* src/conf/domain_conf.c (virDomainSnapshotDefParseString)
(virDomainSnapshotDefFormat): Update clients.
* docs/schemas/domainsnapshot.rng: Tighten rng.
* src/qemu/qemu_driver.c (qemuDomainSnapshotLoad): Reload current
snapshot.
(qemuDomainSnapshotCreateXML, qemuDomainRevertToSnapshot)
(qemuDomainSnapshotDiscard): Track current snapshot.

Comment 6 Eric Blake 2011-08-25 22:38:21 UTC
Getting this fixed is a prereq to bug 638510 support for live snapshots via the snapshot_blkdev qemu monitor command.

Comment 7 Eric Blake 2011-08-25 23:11:58 UTC
In POST:
http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-August/msg00633.html
for all but one corner case ('virsh snapshot-delete dom --children') that will be fixed by bug 733529)

Comment 11 Eric Blake 2011-08-26 22:28:18 UTC
As committed in libvirt-0.9.4-6.el6, this introduced a regression when reverting to offline snapshots using old qemu.  However, bug 733762 documents that this has already been broken when using newer qemu that rejects -loadvm of inactive snapshots.  So either way, this bug cannot be fully verified until that bug has been fixed.

Comment 13 Eric Blake 2011-08-30 16:35:22 UTC
Moving back to ASSIGNED - Philipp Hahn pointed out that SIGHUP also has problems remembering the current snapshot:
https://www.redhat.com/archives/libvir-list/2011-August/msg01444.html

Comment 14 Eric Blake 2011-09-01 21:34:07 UTC
I haven't been able to reproduce Philipp's SIGHUP issues (but did ask him for more details), but I have found a corner case where an OOM condition could leave stale metadata behind:
https://www.redhat.com/archives/libvir-list/2011-September/msg00094.html

Comment 16 yuping zhang 2011-09-09 11:35:31 UTC
Reproduce this issue with libvirt-0.8.7-18.el6.x86_64 :
# virsh snapshot-current rhel6
<domainsnapshot>
  <name>1315604441</name>
  <state>running</state>
  <creationTime>1315604441</creationTime>
  <domain>
    <uuid>6df50163-a754-8430-34dc-b8b8e549736e</uuid>
  </domain>
</domainsnapshot>

[root@dhcp-93-226 libvirt-0.8.7-18]# service  libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]
[root@dhcp-93-226 libvirt-0.8.7-18]# virsh snapshot-current rhel6

Verified this issue with:
libvirt-0.9.4-11.el6.x86_64
qemu-kvm-0.12.1.2-2.185.el6.x86_64

# virsh snapshot-current rhel6
<domainsnapshot>
  <name>1315618546</name>
  <state>shutoff</state>
  <creationTime>1315618546</creationTime>
<domain type='kvm'>
  <name>rhel6</name>
  <uuid>6df50163-a754-8430-34dc-b8b8e549736e</uuid>
  <memory>1048576</memory>
  <currentMemory>1048576</currentMemory>
....

# service  libvirtd restart
Stopping libvirtd daemon:                                  [  OK  ]
Starting libvirtd daemon:                                  [  OK  ]
# virsh snapshot-current rhel6
<domainsnapshot>
  <name>1315618546</name>
  <state>shutoff</state>
  <creationTime>1315618546</creationTime>
<domain type='kvm'>
  <name>rhel6</name>
  <uuid>6df50163-a754-8430-34dc-b8b8e549736e</uuid>
  <memory>1048576</memory>
  <currentMemory>1048576</currentMemory>
  <vcpu>1</vcpu>
 .............

Still test virsh snapshot-revert then restart libvirtd,virsh snapshot-current still works well.

So change the status to VERIFIED.

Comment 17 Eric Blake 2011-11-10 23:43:53 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Cause
    A number of logic bugs were present in libvirt snapshot (system checkpoint) handling; among these, restarting libvirtd would lose track of the current snapshot, and a change in qemu behavior triggered a latent bug in libvirt's ability to restore certain snapshots.
Consequence
    Snapshots were unreliable and hard to manage without tripping up on limitations, contrary to the documentation.
Fix
    A number of bug fixes and added flags to existing snapshot management APIs, along with better testing of more snapshot scenarios, allowed libvirt to actually provide all the snapshot features that it had previously documented.
Result
    Management applications can use system checkpoint snapshots for better control in rolling back to known stable states of a VM.

Comment 18 errata-xmlrpc 2011-12-06 10:54:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1513.html