Description of problem: Libvirt uses qemu-img to create internal snapshots for an offline domain whose disks are all qcow2. It uses an arbitrary snapshot name passed in by the user for both creation and deletion requests to qemu-img. But if the user chooses simple numeric names for their snapshot, then they can cause an ambiguity in the qcow2 file, which in turn causes libvirt's use of snapshot deletion to delete the wrong data. Version-Release number of selected component (if applicable): libvirt-0.9.4-5.el6 qemu-img-0.12.1.2-2.183.el6.x86_64 How reproducible: I haven't actually tried creating a libvirt scenario to test things, so much as demonstrating it at the lower qemu-img level. I believe it can be as simple as defining a domain with a qcow2 disk (you don't even have to run or populate the disk!), then run 'virsh snapshot-create-as dom 2', 'virsh snapshot-create-as dom 3', 'virsh snapshot-create-as dom 1', 'virsh snapshot-delete dom 2' to trigger the same underlying qemu-img patterns. Steps to Reproduce: 1. $ qemu-img create -f qcow2 foo.img 100k Formatting 'foo.img', fmt=qcow2 size=102400 encryption=off cluster_size=0 2. $ qemu-img snapshot -c 2 foo.img 3. $ qemu-img snapshot -c 3 foo.img 4. $ qemu-img snapshot -c 1 foo.img 5. $ qemu-img snapshot -l foo.img Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 2 0 2011-08-24 15:23:50 00:00:00.000 2 3 0 2011-08-24 15:23:54 00:00:00.000 3 1 0 2011-08-24 15:23:56 00:00:00.000 6. $ qemu-img snapshot -d 2 foo.img 7. $ qemu-img snapshot -l foo.img Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 2 0 2011-08-24 15:23:50 00:00:00.000 3 1 0 2011-08-24 15:23:56 00:00:00.000 Actual results: In step 6, qemu-img treated 'snapshot -d 2' as a request to delete snapshot ID 2 instead of snapshot tag 2. This resulted in the loss of tag 3, even though the user wanted the name '2' (tag 1) gone. Expected results: if a snapshot name is numeric, libvirt should correlate it back to the tag id before doing any qemu-img operations on that snapshot, since qemu-img apparently favors id over tag. Libvirt cannot outright forbid the creation of snapshot names that are purely integral, especially since libvirt creates such names by default (although libvirt's default name is the number of seconds since the Epoch, which is a much bigger integer and therefore less likely to be confused with qcow2 snapshot ids). Additional info:
I need to also test if this same qcow2 ambiguity between snapshot ids vs. numeric tags can affect 'qemu -loadvm' or the loadvm monitor command using live system checkpoint snapshots. That is, I suspect that the wrong image will be loaded in this scenario (although I haven't yet tested): With a running guest, open a terminal in the guest terminal, type 2 virsh snapshot-create-as dom 2 in the guest terminal, erase 2 and type 3 virsh snapshot-create-as dom 3 in the guest terminal, erase 3 and type 1 virsh snapshot-create-as dom 1 virsh snapshot-revert dom 2 inspect the guest terminal - if it states 3 instead of 2, we reverted to the wrong snapshot Fortunately, compared to deletion, reverting the wrong snapshot is not a data loss scenario, just wrong behavior.
Getting this fixed is a prereq to bug 638510 support for live snapshots via the snapshot_blkdev qemu monitor command.
I just confirmed the test in comment #2. qemu loadvm gives higher priority to snapshot ids than tags, and reverted to the wrong snapshot. However, I also discovered that while 'qemu-img snapshot -c 1' created id 3 with tag 1, 'qemu savevm 1' _reused_ slot 1, without changing its name, silently corrupting the previous contents of that slot. I'm not sure how to go about fixing this short of forbidding integers below a particular threshold (but what threshold?) so that seconds-since-Epoch still works but small numbers like '10' are forbidden. :(
Maybe I'll just clone this as a bug against qemu, that it should behave like qemu-img in creating a tagged snapshot with a new id even if the requested name matches an existing id; then fix libvirt to check if the name to revert to or delete is an integer, and if so, do an id lookup to revert or delete by id instead.
Thinking about it a bit more: libvirt supports reuse of existing qcow2 images, which may already have internal snapshots by arbitrary names not already present in libvirt's metadata. I think the safest course of action is to change libvirt to forbid creation of a new snapshot _if_ the requested snapshot name already exists as an internal snapshot in one of the qcow2 images, whether or not that new name is integral. Then, for revert and delete, if the snapshot name to be deleted is an integer, then delete by id, otherwise delete by tag. This should be safe for most uses, and forbid only the cases that really would cause problems. I still have to work on the patches for this, but it should not be too difficult to do for both online (qemu savevm) and offline (qemu-img snapshot -c) cases.
Eric, can you confirm with the ovirt guys that they don't use the API in a way that would be problematic?
VDSM (oVirt) is not using internal qcow2 snapshots. Reading the comments I didn't understand if there's any actual consequence (API/XML change) for external snapshots.
External snapshots are immune - either the external snapshot filename exists or it doesn't, but that's exposed in the file system. It's only internal snapshots where libvirt is not (yet) checking where a requested snapshot name conflicts with existing internal snapshot metadata.
Yes sorry if I wasn't clear enough, I was asking if fixing the internal snapshots issue will produce any change in the API/XML that is going to affect the external snapshots too. (It shouldn't but I'm asking just to be on the safe side).
The fix for this bug would be a change that requesting a snapshot name that conflicts with an existing internal snapshot name would now error out instead of proceeding on with potential data loss. If all you use is external snapshots, you should see no difference in behavior.
I can reproduce it with: libvirt-0.9.10-21.el6_3.1.x86_64 virsh # list Id Name State ---------------------------------------------------- 2 qcow running virsh # snapshot-list qcow Name Creation Time State ------------------------------------------------------------ virsh # snapshot-create-as qcow 2 Domain snapshot 2 created virsh # snapshot-list qcow Name Creation Time State ------------------------------------------------------------ 2 2012-06-25 14:52:45 +0800 running [root@zhpeng ~]# qemu-img snapshot -l /var/lib/libvirt/images/qcow.img Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 2 3.0M 2012-06-25 14:52:45 00:04:44.761 virsh # snapshot-create-as qcow 3 Domain snapshot 3 created virsh # snapshot-list qcow Name Creation Time State ------------------------------------------------------------ 2 2012-06-25 14:52:45 +0800 running 3 2012-06-25 14:53:38 +0800 running [root@zhpeng ~]# qemu-img snapshot -l /var/lib/libvirt/images/qcow.img Snapshot list: ID TAG VM SIZE DATE VM CLOCK 1 2 3.0M 2012-06-25 14:52:45 00:04:44.761 2 3 3.0M 2012-06-25 14:53:38 00:05:37.760 virsh # snapshot-current --name qcow 3 virsh # snapshot-current --name qcow 3 virsh # snapshot-create-as qcow 1 Domain snapshot 1 created virsh # snapshot-list qcow Name Creation Time State ------------------------------------------------------------ 1 2012-06-25 14:55:42 +0800 running 2 2012-06-25 14:52:45 +0800 running 3 2012-06-25 14:53:38 +0800 running [root@zhpeng ~]# qemu-img snapshot -l /var/lib/libvirt/images/qcow.img Snapshot list: ID TAG VM SIZE DATE VM CLOCK 2 3 3.0M 2012-06-25 14:53:38 00:05:37.760 1 2 3.0M 2012-06-25 14:55:42 00:07:39.494 virsh # snapshot-current --name qcow 1 virsh # snapshot-current qcow <domainsnapshot> <name>1</name> <state>running</state> <parent> <name>3</name> </parent> <creationTime>1340607342</creationTime> virsh # snapshot-revert qcow 2 check in the guest, it's snapshot 3 status
Upstream qemu may be helping us reach a better solution: https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01936.html
Any update to this? We have just stumbled onto what appears to be this same issue. Is there a workaround or does pulling the upstream patch work?
FWIW I suspect this is still relevant. The referenced qemu patch was about converting savevm to QMP which never happened either, so I doubt this changed from the qemu side. Maybe the simplest thing to do is just have libvirt reject all numeric snapshot names, at least for the qemu driver.
It's still relevant also for RHEL 7, the use-case with snapshot-create-as and snapshot-revert still misbehaves.
QEMU dropped support for using numeric IDs in handling snapshots in commit 6ca080453ea403959ccde661030ca16264acc181 Author: Daniel Henrique Barboza <danielhb413> Date: Wed Nov 7 11:09:58 2018 -0200 block/snapshot.c: eliminate use of ID input in snapshot operations I don't think we need to duplicate this in libvirt at this point in time