Bug 880565

Summary: combining internal and external snapshots/checkpoints causes problems
Product: Red Hat Enterprise Linux Advanced Virtualization Reporter: EricLee <bili>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED WONTFIX QA Contact: yisun
Severity: medium Docs Contact:
Priority: medium    
Version: 8.0CC: cwei, dyuan, hhan, lhuang, lmen, meili, mzhan, xuzhang, yalzhang
Target Milestone: rcKeywords: Triaged
Target Release: 8.1   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-09 20:45:53 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 873285    
Bug Blocks:    
Attachments:
Description Flags
libvirtd.log of debug level. none

Description EricLee 2012-11-27 10:52:05 UTC
Description of problem:
virsh snapshot-delete should add a check point of comparing disk before delete the snapshot.xml to ensure deleting the right snapshot.

Version-Release number of selected component (if applicable):
libvirt-0.10.2-10.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.337.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Prepare guest with a qcow2 type disk, check disk info:
# qemu-img info /mnt/nfs/rhel62.qcow2
image: /mnt/nfs/rhel62.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 3.2G
cluster_size: 65536

2. Create a basic snapshot when the guest is shut off, and check snapshot status:
# virsh snapshot-create-as qcow2 snap1
Domain snapshot snap1 created

# virsh snapshot-list qcow2
 Name                 Creation Time             State
------------------------------------------------------------
 snap1                2012-11-27 17:41:02 +0800 shutoff

# qemu-img info /mnt/nfs/rhel62.qcow2
image: /mnt/nfs/rhel62.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 3.2G
cluster_size: 65536
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         snap1                     0 2012-11-27 17:41:02   00:00:00.000

3. # virsh start qcow2
Domain qcow2 started

4. Create a disk snapshot for the guest:
# virsh snapshot-create-as qcow2 snap2 --disk-only
Domain snapshot snap2 created

5. # virsh snapshot-list qcow2
 Name                 Creation Time             State
------------------------------------------------------------
 snap1                2012-11-27 17:41:02 +0800 shutoff
 snap2                2012-11-27 17:43:12 +0800 disk-snapshot

6. # virsh dumpxml qcow2
....
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/nfs/rhel62.snap2'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x05' function='0x0'/>
    </disk>
....

7. Keep guest running, and delete snap1:
# virsh snapshot-delete qcow2 snap1
Domain snapshot snap1 deleted

# virsh snapshot-list qcow2
 Name                 Creation Time             State
------------------------------------------------------------
 snap2                2012-11-27 17:43:12 +0800 disk-snapshot

8. But then check qemu-img info, snap1 still in the disk:
# qemu-img info /mnt/nfs/rhel62.qcow2
image: /mnt/nfs/rhel62.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 3.2G
cluster_size: 65536
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         snap1                     0 2012-11-27 17:41:02   00:00:00.000

That's meaning that snapshot-delete just delete the file snap1.xml in /var/lib/libvirt/qemu/snapshot/qcow2/, but do not delete the real snapshot of that disk.
And that will occur the circumstances:
# virsh destroy 1
Domain 1 destroyed

# virsh edit qcow2     ------- change disk to /mnt/nfs/rhel62.qcow2
Domain qcow2 XML configuration edited.

# virsh snapshot-create-as qcow2 snap1
Domain snapshot snap1 created

# qemu-img info /mnt/nfs/rhel62.qcow2
image: /mnt/nfs/rhel62.qcow2
file format: qcow2
virtual size: 8.0G (8589934592 bytes)
disk size: 3.2G
cluster_size: 65536
Snapshot list:
ID        TAG                 VM SIZE                DATE       VM CLOCK
1         snap1                     0 2012-11-27 17:41:02   00:00:00.000
2         snap1                     0 2012-11-27 17:52:50   00:00:00.000

There are two snap1 for the disk.

Actual results:
As steps

Expected results:
virsh snapshot-delete should add a check point of comparing disk before delete the snapshot.xml to ensure deleting the right snapshot, if they are different, should give error for the command, and delete snapshot failed.

Additional info:
There are error info in libvirtd.log:
2012-11-27 08:54:44.507+0000: 8603: error : virCommandWait:2345 : internal error Child process (/usr/bin/qemu-img snapshot -d snap1 /mnt/nfs/rhel62.qcow2) unexpected exit status 1: qemu-img: Could not delete snapshot 'snap1': -2 (No such file or directory)

2012-11-27 08:54:44.507+0000: 8603: warning : qemuDomainSnapshotForEachQcow2Raw:1686 : skipping snapshot action on vda

Attachment is the detail logs of libvirtd.log.

Comment 1 EricLee 2012-11-27 10:53:17 UTC
Created attachment 652582 [details]
libvirtd.log of debug level.

Comment 2 EricLee 2012-11-27 10:55:42 UTC
This bug is not serious, so taged as rhel-6.5.0 ?.

Comment 3 Peter Krempa 2013-01-30 14:23:25 UTC
The problem here is the mix of internal and external snapshots:

While the domain is running operations on internal snapshots must be done using qemu monitor commands. This includes deletion of a snapshot too. 

Mixing external snapshots here induces the problem. An external snapshot creates a separate file that is afterwards used as the disk image in qemu. This disk image does not contain the snapshots of the previous one. As the machine is running the deletion of the snapshot is attempted using the qemu monitor command that will not be able to find the snapshot as the snapshot isn't in the current image.

To fix this it will be necessary to find out if we are able to touch images that are part of a active image chain and delete snapshots in them.

The other option ... that is a little bit radical is to forbid mixing internal and external snapshots as it can lead to other problems too.

Comment 7 Jiri Denemark 2014-04-04 21:37:16 UTC
This bug was not selected to be addressed in Red Hat Enterprise Linux 6. We will look at it again within the Red Hat Enterprise Linux 7 product.

Comment 9 Peter Krempa 2015-02-07 09:06:50 UTC
*** Bug 1175603 has been marked as a duplicate of this bug. ***

Comment 13 Jaroslav Suchanek 2019-04-24 12:26:50 UTC
This bug is going to be addressed in next major release.

Comment 14 Peter Krempa 2020-03-17 18:05:51 UTC
*** Bug 1733173 has been marked as a duplicate of this bug. ***