Bug 733143

Summary: qemu: numeric snapshot names can lead to inadvertently deleting the wrong snapshot
Product: [Community] Virtualization Tools Reporter: Eric Blake <eblake>
Component: libvirtAssignee: Libvirt Maintainers <libvirt-maint>
Status: CLOSED WONTFIX QA Contact:
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: berrange, crobinso, cwei, ddumas, dyuan, eblake, fsimonce, jhradile, mzhan, shawn.pagan, vlastimil.holer
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-11 16:52:02 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 638510    
Bug Blocks: 740375, 756082, 840699, 840921    

Description Eric Blake 2011-08-24 21:49:20 UTC
Description of problem:
Libvirt uses qemu-img to create internal snapshots for an offline domain whose disks are all qcow2.  It uses an arbitrary snapshot name passed in by the user for both creation and deletion requests to qemu-img.  But if the user chooses simple numeric names for their snapshot, then they can cause an ambiguity in the qcow2 file, which in turn causes libvirt's use of snapshot deletion to delete the wrong data.

Version-Release number of selected component (if applicable):
libvirt-0.9.4-5.el6
qemu-img-0.12.1.2-2.183.el6.x86_64

How reproducible:
I haven't actually tried creating a libvirt scenario to test things, so much as demonstrating it at the lower qemu-img level.  I believe it can be as simple as defining a domain with a qcow2 disk (you don't even have to run or populate the disk!), then run 'virsh snapshot-create-as dom 2', 'virsh snapshot-create-as dom 3', 'virsh snapshot-create-as dom 1', 'virsh snapshot-delete dom 2' to trigger the same underlying qemu-img patterns.

Steps to Reproduce:
1. $ qemu-img create -f qcow2 foo.img 100k
Formatting 'foo.img', fmt=qcow2 size=102400 encryption=off cluster_size=0 
2. $ qemu-img snapshot -c 2 foo.img
3. $ qemu-img snapshot -c 3 foo.img
4. $ qemu-img snapshot -c 1 foo.img
5. $ qemu-img snapshot -l foo.img
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         2                         0 2011-08-24 15:23:50   00:00:00.000
2         3                         0 2011-08-24 15:23:54   00:00:00.000
3         1                         0 2011-08-24 15:23:56   00:00:00.000
6. $ qemu-img snapshot -d 2 foo.img
7. $ qemu-img snapshot -l foo.img
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         2                         0 2011-08-24 15:23:50   00:00:00.000
3         1                         0 2011-08-24 15:23:56   00:00:00.000

  
Actual results:
In step 6, qemu-img treated 'snapshot -d 2' as a request to delete snapshot ID 2 instead of snapshot tag 2.  This resulted in the loss of tag 3, even though the user wanted the name '2' (tag 1) gone.

Expected results:
if a snapshot name is numeric, libvirt should correlate it back to the tag id before doing any qemu-img operations on that snapshot, since qemu-img apparently favors id over tag.

Libvirt cannot outright forbid the creation of snapshot names that are purely integral, especially since libvirt creates such names by default (although libvirt's default name is the number of seconds since the Epoch, which is a much bigger integer and therefore less likely to be confused with qcow2 snapshot ids).

Additional info:

Comment 2 Eric Blake 2011-08-25 21:50:21 UTC
I need to also test if this same qcow2 ambiguity between snapshot ids vs. numeric tags can affect 'qemu -loadvm' or the loadvm monitor command using live system checkpoint snapshots.  That is, I suspect that the wrong image will be loaded in this scenario (although I haven't yet tested):

With a running guest, open a terminal
in the guest terminal, type 2
virsh snapshot-create-as dom 2
in the guest terminal, erase 2 and type 3
virsh snapshot-create-as dom 3
in the guest terminal, erase 3 and type 1
virsh snapshot-create-as dom 1
virsh snapshot-revert dom 2
inspect the guest terminal - if it states 3 instead of 2, we reverted to the wrong snapshot

Fortunately, compared to deletion, reverting the wrong snapshot is not a data loss scenario, just wrong behavior.

Comment 3 Eric Blake 2011-08-25 22:41:34 UTC
Getting this fixed is a prereq to bug 638510 support for live snapshots via the snapshot_blkdev qemu monitor command.

Comment 4 Eric Blake 2011-08-25 23:45:54 UTC
I just confirmed the test in comment #2.  qemu loadvm gives higher priority to snapshot ids than tags, and reverted to the wrong snapshot.

However, I also discovered that while 'qemu-img snapshot -c 1' created id 3 with tag 1, 'qemu savevm 1' _reused_ slot 1, without changing its name, silently corrupting the previous contents of that slot.

I'm not sure how to go about fixing this short of forbidding integers below a particular threshold (but what threshold?) so that seconds-since-Epoch still works but small numbers like '10' are forbidden.  :(

Comment 5 Eric Blake 2011-08-25 23:48:28 UTC
Maybe I'll just clone this as a bug against qemu, that it should behave like qemu-img in creating a tagged snapshot with a new id even if the requested name matches an existing id; then fix libvirt to check if the name to revert to or delete is an integer, and if so, do an id lookup to revert or delete by id instead.

Comment 6 Eric Blake 2011-08-26 14:05:13 UTC
Thinking about it a bit more: libvirt supports reuse of existing qcow2 images, which may already have internal snapshots by arbitrary names not already present in libvirt's metadata.  I think the safest course of action is to change libvirt to forbid creation of a new snapshot _if_ the requested snapshot name already exists as an internal snapshot in one of the qcow2 images, whether or not that new name is integral.  Then, for revert and delete, if the snapshot name to be deleted is an integer, then delete by id, otherwise delete by tag.  This should be safe for most uses, and forbid only the cases that really would cause problems.

I still have to work on the patches for this, but it should not be too difficult to do for both online (qemu savevm) and offline (qemu-img snapshot -c) cases.

Comment 12 Dave Allan 2012-02-15 22:26:02 UTC
Eric, can you confirm with the ovirt guys that they don't use the API in a way that would be problematic?

Comment 13 Federico Simoncelli 2012-02-16 11:27:44 UTC
VDSM (oVirt) is not using internal qcow2 snapshots. Reading the comments I didn't understand if there's any actual consequence (API/XML change) for external snapshots.

Comment 14 Eric Blake 2012-02-16 13:09:22 UTC
External snapshots are immune - either the external snapshot filename exists or it doesn't, but that's exposed in the file system.  It's only internal snapshots where libvirt is not (yet) checking where a requested snapshot name conflicts with existing internal snapshot metadata.

Comment 15 Federico Simoncelli 2012-02-16 13:38:45 UTC
Yes sorry if I wasn't clear enough, I was asking if fixing the internal snapshots issue will produce any change in the API/XML that is going to affect the external snapshots too. (It shouldn't but I'm asking just to be on the safe side).

Comment 16 Eric Blake 2012-02-16 13:46:52 UTC
The fix for this bug would be a change that requesting a snapshot name that conflicts with an existing internal snapshot name would now error out instead of proceeding on with potential data loss.  If all you use is external snapshots, you should see no difference in behavior.

Comment 18 zhpeng 2012-06-25 07:56:52 UTC
I can reproduce it with:
libvirt-0.9.10-21.el6_3.1.x86_64

virsh # list 
 Id    Name                           State
----------------------------------------------------
 2     qcow                           running
virsh # snapshot-list qcow
 Name                 Creation Time             State
------------------------------------------------------------


virsh # snapshot-create-as qcow 2
Domain snapshot 2 created
virsh # snapshot-list qcow
 Name                 Creation Time             State
------------------------------------------------------------
 2                    2012-06-25 14:52:45 +0800 running

[root@zhpeng ~]# qemu-img snapshot -l /var/lib/libvirt/images/qcow.img 
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         2                      3.0M 2012-06-25 14:52:45   00:04:44.761

virsh # snapshot-create-as qcow 3
Domain snapshot 3 created
virsh # snapshot-list qcow 
 Name                 Creation Time             State
------------------------------------------------------------
 2                    2012-06-25 14:52:45 +0800 running
 3                    2012-06-25 14:53:38 +0800 running

[root@zhpeng ~]# qemu-img snapshot -l /var/lib/libvirt/images/qcow.img 
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         2                      3.0M 2012-06-25 14:52:45   00:04:44.761
2         3                      3.0M 2012-06-25 14:53:38   00:05:37.760
virsh # snapshot-current --name qcow
3

virsh # snapshot-current --name qcow
3
virsh # snapshot-create-as qcow 1
Domain snapshot 1 created
virsh # snapshot-list qcow 
 Name                 Creation Time             State
------------------------------------------------------------
 1                    2012-06-25 14:55:42 +0800 running
 2                    2012-06-25 14:52:45 +0800 running
 3                    2012-06-25 14:53:38 +0800 running
[root@zhpeng ~]# qemu-img snapshot -l /var/lib/libvirt/images/qcow.img 
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
2         3                      3.0M 2012-06-25 14:53:38   00:05:37.760
1         2                      3.0M 2012-06-25 14:55:42   00:07:39.494

virsh # snapshot-current --name qcow
1
virsh # snapshot-current qcow
<domainsnapshot>
  <name>1</name>
  <state>running</state>
  <parent>
    <name>3</name>
  </parent>
  <creationTime>1340607342</creationTime>

virsh # snapshot-revert qcow 2

check in the guest, it's snapshot 3 status

Comment 19 Eric Blake 2012-07-16 16:35:23 UTC
Upstream qemu may be helping us reach a better solution:
https://lists.gnu.org/archive/html/qemu-devel/2012-07/msg01936.html

Comment 27 Shawn Pagan 2013-12-12 15:50:33 UTC
Any update to this?  We have just stumbled onto what appears to be this same issue.  Is there a workaround or does pulling the upstream patch work?

Comment 28 Cole Robinson 2016-03-23 13:41:33 UTC
FWIW I suspect this is still relevant. The referenced qemu patch was about converting savevm to QMP which never happened either, so I doubt this changed from the qemu side. Maybe the simplest thing to do is just have libvirt reject all numeric snapshot names, at least for the qemu driver.

Comment 29 Vlastimil Holer 2017-09-25 08:51:36 UTC
It's still relevant also for RHEL 7, the use-case with snapshot-create-as and snapshot-revert still misbehaves.

Comment 30 Daniel Berrangé 2020-09-11 16:52:02 UTC
QEMU dropped support for using numeric IDs in handling snapshots in

  commit 6ca080453ea403959ccde661030ca16264acc181
  Author: Daniel Henrique Barboza <danielhb413>
  Date:   Wed Nov 7 11:09:58 2018 -0200

    block/snapshot.c: eliminate use of ID input in snapshot operations

I don't think we need to duplicate this in libvirt at this point in time