Hide Forgot
Description of problem: libvirt 0.9.0 changed hash operations to forbid nested hash iterations (commit fba550) because they could lead to memory corruption. But the implementation for virDomainSnapshotDelete with the VIR_DOMAIN_SNAPSHOT_DELETE_CHILDREN flags (aka 'virsh snapshot-delete domain name --children') has always been relying on nested hash iterations since its introduction in 0.8.0. Prior to 0.9.0, that means that attempts to delete a hierarchy of snapshots would usually appear to succeed, but in a corner case dependent on hash values, could corrupt memory; after that point, this means that this functionality flat out fails to work and definitely leaves behind corrupt snapshot layouts (stranded grandchildren that point to non-existent parents). Version-Release number of selected component (if applicable): libvirt-0.9.4-5.el6 How reproducible: 100% Steps to Reproduce: 1. with an offline domain (online works too, but takes much longer to demonstrate): for name in s1 s2 s3 s4; do virsh snapshot-create-as dom $name; done 2. virsh snapshot-delete dom s2 3. virsh snapshot-current dom --name && virsh snapshot-list dom --parent Actual results: 1. Creates four snapshots; s4 is current 2. error messages are logged about forbidden nested hash iterations 3. snapshot s2 and s3 are gone, but s4 still exists and still claims to be current Expected results: 1. Creates four snapshots; s4 is current 2. no error messages in the logs 3. snapshot s2, s3, and s4 are gone, and s1 is now current Additional info: Step 3 requires some additional snapshot patches in virsh that weren't present in libvirt-0.9.4-5.el6. For validating the bug in older libvirtd, you can either: 1. Run older libvirtd but upgrade to newer virsh 2. Use 'virsh snapshot-current dom' and inspect the xml to learn the current snapshot name, as well as 'virsh snapshot-list dom' followed by 'virsh snapshot-dumpxml dom name' for each name in that list to see which snapshots still exist and what parent they claim to have
Marking this as a regression - the fact that snapshot children deletion usually worked in RHEL 6.0 but broke during hash-table tightening for upstream libvirt 0.9.0 (picked up by rebase in RHEL 6.2, but also hash-table tightening was backported to RHEL 6.1), is a visible worsening of behavior. Meanwhile, getting this fixed is a prereq to bug 638510 support for live snapshots via the snapshot_blkdev qemu monitor command.
A chain of upstream patches to fix this ends here: https://www.redhat.com/archives/libvir-list/2011-August/msg01213.html snapshot: track current domain across deletion of children Deleting a snapshot and all its descendants had problems with tracking the current snapshot. The deletion does not necessarily proceed in depth-first order, so a parent could be deleted before a child, wreaking havoc on passing the notion of the current snapshot to the parent. Furthermore, even if traversal were depth-first, doing multiple file writes to pass current up the chain one snapshot at a time is wasteful, comparing to a single update to the current snapshot at the end of the algorithm. * src/qemu/qemu_driver.c (snap_remove): Add field. (qemuDomainSnapshotDiscard): Add parameter. (qemuDomainSnapshotDiscardDescendant): Adjust accordingly. (qemuDomainSnapshotDelete): Properly reset current.
In POST: http://post-office.corp.redhat.com/archives/rhvirt-patches/2011-September/msg00111.html
I can't reproduce this issue with libvirt-0.9.4-5.el6. # rpm -qa | grep libvirt libvirt-python-0.9.4-5.el6.x86_64 libvirt-0.9.4-5.el6.x86_64 libvirt-client-0.9.4-5.el6.x86_64 #for name in s1 s2 s3 s4; do virsh snapshot-create-as kvm-rhel6.1-x86_64-qcow2 $name; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created # virsh snapshot-delete kvm-rhel6.1-x86_64-qcow2 s2 Domain snapshot s2 deleted # virsh snapshot-list kvm-rhel6.1-x86_64-qcow2 Name Creation Time State --------------------------------------------------- s1 2011-09-10 09:10:59 +0800 shutoff s3 2011-09-10 09:11:10 +0800 shutoff s4 2011-09-10 09:11:16 +0800 shutoff # virsh snapshot-current kvm-rhel6.1-x86_64-qcow2 <domainsnapshot> <name>s4</name> <state>shutoff</state> <parent> <name>s3</name> </parent> <creationTime>1315617076</creationTime> <domain> <uuid>1c0c1a0f-f014-b261-fc3a-32168ac5e0a6</uuid> </domain> </domainsnapshot> And also no --name and --parent options with snapshot-current in libvirt-0.9.4-5.el6
I can't test this issue with libvirt-0.9.4-11.el6 as bug 737010 blocks snapshot-create-as command.
Reproduced this issue with libvirt-0.9.4-5.el6 and libvirt-client-0.9.4-11.el6. virsh # snapshot-list rhel6q2 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 17:07:15 +0800 shutoff s2 2011-09-13 17:07:20 +0800 shutoff s1 s3 2011-09-13 17:07:25 +0800 shutoff s2 s4 2011-09-13 17:07:30 +0800 shutoff s3 virsh # snapshot-current rhel6q2 --name s4 virsh # snapshot-delete rhel6q2 s2 --children Domain snapshot s2 deleted virsh # snapshot-list rhel6q2 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 17:07:15 +0800 shutoff s4 2011-09-13 17:07:30 +0800 shutoff s3 virsh # snapshot-current rhel6q2 --name s4
Tested with online domain, reproduced with libvirt-0.9.4-5.el6 and verified PASS with libvirt-0.9.4-11.el6. Will re-check it with offline domain once bug 737010 fixed. ==> Reproduced with libvirt-0.9.4-5.el6. virsh # snapshot-list rhel6q2 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 18:15:57 +0800 running s2 2011-09-13 18:17:16 +0800 running s1 s3 2011-09-13 18:22:39 +0800 running s2 s4 2011-09-13 18:27:31 +0800 running s3 virsh # snapshot-current rhel6q2 --name s4 virsh # snapshot-delete rhel6q2 s2 --children Domain snapshot s2 deleted virsh # snapshot-list rhel6q2 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 18:15:57 +0800 running s4 2011-09-13 18:27:31 +0800 running s3 virsh # snapshot-current rhel6q2 --name s4 ==> Verified with libvirt-0.9.4-11.el6. # for name in s1 s2 s3 s4; do virsh snapshot-create-as rhel6 $name; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created virsh # snapshot-list rhel6 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 17:28:31 +0800 running s2 2011-09-13 17:40:19 +0800 running s1 s3 2011-09-13 17:45:21 +0800 running s2 s4 2011-09-13 17:50:33 +0800 running s3 virsh # snapshot-current rhel6 --name s4 virsh # snapshot-delete rhel6 s2 --children Domain snapshot s2 deleted virsh # snapshot-list rhel6 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 17:28:31 +0800 running virsh # snapshot-current rhel6 --name s1
Verified with libvirt-0.9.4-12.el6. # for name in s1 s2 s3 s4; do virsh snapshot-create-as rhel6 $name; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created virsh # snapshot-list rhel6 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 17:28:31 +0800 running s2 2011-09-13 17:40:19 +0800 running s1 s3 2011-09-13 17:45:21 +0800 running s2 s4 2011-09-13 17:50:33 +0800 running s3 virsh # snapshot-current rhel6 --name s4 virsh # snapshot-delete rhel6 s2 --children Domain snapshot s2 deleted virsh # snapshot-list rhel6 --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-13 17:28:31 +0800 running virsh # snapshot-current rhel6 --name s1
Verify it with offline guest for name in s1 s2 s3 s4; do virsh snapshot-create-as snap $name; done Domain snapshot s1 created Domain snapshot s2 created Domain snapshot s3 created Domain snapshot s4 created [root@rhel62-wh ~]# virsh Welcome to virsh, the virtualization interactive terminal. Type: 'help' for help with commands 'quit' to quit virsh # snapshot-current snap --name s4 virsh # snapshot-list snap --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-26 13:13:54 +0800 shutoff s2 2011-09-26 13:13:59 +0800 shutoff s1 s3 2011-09-26 13:14:03 +0800 shutoff s2 s4 2011-09-26 13:14:08 +0800 shutoff s3 snap1 2011-08-16 15:46:41 +0800 running virsh # snapshot-delete snap s2 --children Domain snapshot s2 deleted virsh # snapshot-list rhel6 --parent error: failed to get domain 'rhel6' error: Domain not found: no domain with matching name 'rhel6' virsh # snapshot-list snap --parent Name Creation Time State Parent ------------------------------------------------------------ s1 2011-09-26 13:13:54 +0800 shutoff snap1 2011-08-16 15:46:41 +0800 running virsh # snapshot-current snap --name s1 virsh #
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1513.html