Bug 1901830 - [RFE][CBT][incremental backup] Allow redefine VM backup checkpoint without the domain XML
Summary: [RFE][CBT][incremental backup] Allow redefine VM backup checkpoint without th...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.3
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: rc
: 8.4
Assignee: Peter Krempa
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks: 1891470 1901835
TreeView+ depends on / blocked
 
Reported: 2020-11-26 07:37 UTC by Eyal Shenitzky
Modified: 2021-02-22 15:39 UTC (History)
8 users (show)

Fixed In Version: libvirt-6.6.0-9.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1904486 (view as bug list)
Environment:
Last Closed: 2021-02-22 15:39:42 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Eyal Shenitzky 2020-11-26 07:37:51 UTC
Description of problem:

Currently, when having a checkpoint that needs to be defined for a VM with a backup, the checkpoint should be defined by giving the <domaincheckpoint> XML and the entire VM <domain> XML.

Supplying the VM domain XML is redundant. In RHV, it forces us to persists the XML that was created when the backup was taken in the RHV engine database and use it when the checkpoint should be redefined.

Moreover, RHV may encounter some errors during the backup (communication issue etc..) and might lose the returned checkpoint and domain XML so we added a recovery mechanism to fetch the entire XML.

If Libvirt will provide a way to redefine the checkpoint without the domain XML, it will facilitate the process and reduce many error-prone stages.

The call for defining a checkpoint when backup is started using Libvirt python SDK is currently - 
dom.backupBegin(backup_xml, checkpoint_xml, flags=flags)

The call for defining a checkpoint after a backup was taken using Libvirt python SDK is currently - 
dom.checkpointCreateXML(checkpoint_xml, flags)

The solution might be to provide a flag for backupBegin and checkpointCreateXML that will indicate whether the domain XML is needed or not (similar to virDomainCheckpointGetXMLDesc with VIR_DOMAIN_CHECKPOINT_XML_NO_DOMAIN flag)

Version-Release number of selected component (if applicable):
6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 3 Peter Krempa 2020-12-04 15:31:43 UTC
Upstream:

commit abf12f071b17d8be51ce5fe368175f2f5612eafe
Author: Peter Krempa <pkrempa>
Date:   Wed Dec 2 14:18:40 2020 +0100

    conf: checkpoint: Don't require <domain> when redefining checkpoints
    
    The domain definition stored with a checkpoint isn't used currently
    apart from matching disks when creating a new checkpoints.
    
    As some users of the incremental backup API want to provide backups in
    offline mode under their control (obviously while compying with our
    documentation on how the on-disk state should be handled) and then want
    to define the checkpoint for live use, supplying a <domain> sub-element
    is overly complex and not actually needed by the code.
    
    Relax the restriction when re-defining a checkpoint so that <domain> is
    not necessary and add (alibistic) documentation saying that future
    actions may not work if it's missing.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>

commit 392eacfeb1c924a7532955d8866677d905fa74fa
Author: Peter Krempa <pkrempa>
Date:   Wed Dec 2 14:13:17 2020 +0100

    conf: checkpoint: Prepare internals for missing domain definition
    
    Conditionalize code which assumes that the domain definition stored in
    the checkpoint is present.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>

commit 9fd8ba3b2d681e5d0c58fc68f99224dbf0a5f810
Author: Peter Krempa <pkrempa>
Date:   Wed Dec 2 13:35:29 2020 +0100

    virDomainCheckpointRedefineCommit: Don't check ABI of definition in checkpoint
    
    Checking the definition ABI when redefining checkpoints doesn't make
    much sense for the following reasons:
    
    * the domain definition in the checkpoint is mostly unused (a relic
      adopted from the snapshot code)
    
    * can be very easily overridden by deleting the checkpoint metadata
      before redefinition
    
    Rather than complicating the logic when we'll be taking into account
    that the domain definition may be missing, let's just remove the check.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>

commit 9a58f1a53c8d94cf867fd3a76515d222750d5d77
Author: Peter Krempa <pkrempa>
Date:   Wed Dec 2 14:33:21 2020 +0100

    virDomainCheckpointDefParse: Use 'unsigned int' for flags
    
    Fix the type for a variable holding flags to the usual 'unsigned int'
    and change the name to be more appropriate to its use.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>

commit d1fd4a37555123fbdf9ebb77f3a004320d75b1a3
Author: Peter Krempa <pkrempa>
Date:   Wed Dec 2 14:29:30 2020 +0100

    virDomainCheckpointDefParse: Don't extract unused domain type
    
    We can extract './domain' directly and let the parser deal with the
    type.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Daniel Henrique Barboza <danielhb413>

Comment 5 Peter Krempa 2020-12-09 14:42:16 UTC
One more upstream commit is related to this bug:

commit f40a72a32e47c248d5002a765913fa8f547960de
Author: Peter Krempa <pkrempa>
Date:   Tue Dec 8 16:16:08 2020 +0100

    qemuDomainCheckpointLoad: Don't align disks when restoring config from disk
    
    The alignment step is not really necessary once we've done it already
    since we fully populate the definition. In case of checkpoints it was a
    relic necessary for populating the 'idx' to match checkpoint disk to
    definition disk, but that was already removed.

Comment 8 yisun 2020-12-16 08:11:37 UTC
Verified with:
libvirt-6.6.0-10.module+el8.3.1+9163+37dc0a58.x86_64

Steps:
1. prepare a vm with vdb pointing to a qcow2 image
[root@dell-per740xd-12 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/vdb.qcow2 1G
Formatting '/var/lib/libvirt/images/vdb.qcow2', fmt=qcow2 cluster_size=65536 compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16
[root@dell-per740xd-12 ~]# virsh domblklist vm1
 Target   Source
--------------------------------------------------------
 vda      /var/lib/libvirt/images/jeos-27-x86_64.qcow2
 vdb      /var/lib/libvirt/images/vdb.qcow2

[root@dell-per740xd-12 ~]# virsh start vm1
Domain vm1 started

2. create a checkpoint for vdb, named cp0
[root@dell-per740xd-12 ~]# cat checkpoint_0.xml 
<domaincheckpoint>
  <disks>
    <disk checkpoint="no" name="vda" />
    <disk checkpoint="bitmap" name="vdb" />
  </disks>
  <name>cp0</name>
  <description>cp0</description>
</domaincheckpoint>

[root@dell-per740xd-12 ~]# virsh checkpoint-create vm1 checkpoint_0.xml 
Domain checkpoint cp0 created from 'checkpoint_0.xml'


3. dumpxml the checkponit cp0 without domain info, and save it in test_cp.xml
[root@dell-per740xd-12 ~]# virsh checkpoint-dumpxml vm1 cp0 --no-domain
<domaincheckpoint>
  <name>cp0</name>
  <description>cp0</description>
  <creationTime>1608105517</creationTime>
  <disks>
    <disk name='vda' checkpoint='no'/>
    <disk name='vdb' checkpoint='bitmap' bitmap='cp0'/>
  </disks>
</domaincheckpoint>

[root@dell-per740xd-12 ~]# cat test_cp.xml 
<domaincheckpoint>
  <name>cp0</name>
  <description>cp0</description>
  <creationTime>1608105517</creationTime>
  <disks>
    <disk name='vda' checkpoint='no'/>
    <disk name='vdb' checkpoint='bitmap' bitmap='cp0'/>
  </disks>
</domaincheckpoint>


4. delete the metadata of the checkpoint cp0
[root@dell-per740xd-12 ~]# virsh checkpoint-delete vm1 cp0 --metadata
Domain checkpoint cp0 deleted

[root@dell-per740xd-12 ~]# virsh checkpoint-list vm1
 Name   Creation Time
-----------------------

5. try to create checkpoint with test_cp.xml, failed as expected
[root@dell-per740xd-12 ~]# virsh checkpoint-create vm1 test_cp.xml 
error: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: cp0

6. try to create the checkpoint with test_cp.xml, with --redefine param
[root@dell-per740xd-12 ~]# virsh checkpoint-create vm1 test_cp.xml --redefine
Domain checkpoint cp0 created from 'test_cp.xml'

7. check the checkpoint metadata is correct
[root@dell-per740xd-12 ~]# virsh checkpoint-list vm1
 Name   Creation Time
-----------------------------------
 cp0    2020-12-16 02:58:37 -0500

[root@dell-per740xd-12 ~]# virsh checkpoint-dumpxml vm1 cp0 --no-domain 
<domaincheckpoint>
  <name>cp0</name>
  <description>cp0</description>
  <creationTime>1608105517</creationTime>
  <disks>
    <disk name='vda' checkpoint='no'/>
    <disk name='vdb' checkpoint='bitmap' bitmap='cp0'/>
  </disks>
</domaincheckpoint>


8. check the qcow2 image's bitmap cp0 still exists
[root@dell-per740xd-12 ~]# virsh destroy vm1
Domain vm1 destroyed

[root@dell-per740xd-12 ~]# virsh start vm1
qemDomain vm1 started

[root@dell-per740xd-12 ~]# qemu-img info /var/lib/libvirt/images/vdb.qcow2 -U
image: /var/lib/libvirt/images/vdb.qcow2
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
disk size: 328 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    bitmaps:
        [0]:
            flags:
                [0]: in-use
                [1]: auto
            name: cp0
            granularity: 65536
    refcount bits: 16
    corrupt: false

9. try to delete the checkpoint cp0 without --metadata, 
[root@dell-per740xd-12 ~]# virsh checkpoint-delete vm1 cp0
Domain checkpoint cp0 deleted

10. check both of the bitmap cp0 in qcow2 image and metadata of libvirt are removed
[root@dell-per740xd-12 ~]# qemu-img info /var/lib/libvirt/images/vdb.qcow2 -U
image: /var/lib/libvirt/images/vdb.qcow2
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
disk size: 456 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

[root@dell-per740xd-12 ~]# virsh checkpoint-list vm1
 Name   Creation Time
-----------------------

Comment 10 errata-xmlrpc 2021-02-22 15:39:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:0639


Note You need to log in before you can comment on or make changes to this bug.