+++ This bug was initially created as a clone of Bug #1874846 +++ Description of problem: When virDomainBackupBegin fails, we always get a generic internal error. The documentation does not document any error for this API: https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainBackupBegin When we get internal error, we don't have a good way to recover from the error. The only way to ensure that the next backup will succeed (if we don't have some bug in libvirt, or unexpected condition) is to delete all checkpoints, and start from full backup. This is not good error handling strategy. We can assume that the error is temporary and retry the operation, but this is also not attractive option. For example if backup failed because the request was invalid (e.g, trying to do incremental backup with raw disk) it can never succeed. If the backup failed because a bitmap is missing or inconsistent, we know that we must delete some checkpoints, and we can do backup only for checkpoints after the checkpoint with the bad bitmap. So what we need is public documented error codes like: https://libvirt.org/html/libvirt-libvirt-domain.html#virDomainDiskErrorCode That allow detection of: - Invalid request (e.g. incremental backup of raw file) - Temporary failure - the request failed but it may succeed in the future - Unrecoverable failure - this backup cannot succeed and should not be retried In the unrecoverable failure category, having a special error code for missing bitmap can help. I don't know if this is possible since before backup we redefine all checkpoints, and we assume that libvirt validate that bitmap exists during checkpoint redefinition. Version-Release number of selected component (if applicable): 8.2 More info: We had one report so far on backup failing because a bitmap was missing after all checkpoints were redefined. We don't have enough details about this case. --- Additional comment from on 2020-09-09 04:41:13 CEST --- Here is a reproduce steps: 1. vm has a vdb pointing to a raw img [root@dell-per740xd-11 ~]# virsh dumpxml vm1 | awk '/<disk/,/<\/disk/' ... <disk type='file' device='disk'> <driver name='qemu' type='raw'/> <source file='/var/lib/libvirt/images/vdb.raw' index='1'/> <backingStore/> <target dev='vdb' bus='virtio'/> <alias name='virtio-disk1'/> <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/> </disk> 2. start the vm [root@dell-per740xd-11 ~]# virsh start vm1 virsh baDomain vm1 started 3. prepare backup and checkpoint xml to do backup [root@dell-per740xd-11 ~]# cat backup_push_full.xml checkpoint_0.xml <domainbackup mode="push"> <disks> <disk name='vda' backup='no'/> <disk name='vdb' type='file'> <driver type='qcow2'/> <target file='/tmp/push_full_backup.img'> <encryption format='luks'> <secret type='passphrase' usage='/luks/img'/> </encryption> </target> </disk> </disks> </domainbackup> <domaincheckpoint> <disks> <disk checkpoint="no" name="vda" /> <disk checkpoint="bitmap" name="vdb" /> </disks> <name>cp0</name> <description>cp0</description> </domaincheckpoint> 4. start the backup [root@dell-per740xd-11 ~]# virsh backup-begin vm1 backup_push_full.xml checkpoint_0.xml error: unsupported configuration: checkpoint for disk vdb unsupported for storage type raw 5. get the error number, it's '1' [root@dell-per740xd-11 ~]# echo $? 1 --- Additional comment from Peter Krempa on 2020-12-04 16:33:34 CET --- Upstream: commit 2a917e6756578e43b21695aa284b39bdc05fa96f Author: Daniel P. Berrangé <berrange> Date: Mon Nov 9 16:14:53 2020 +0000 Fix name prefix of VIR_DOMAIN_CHECKPOINT_REDEFINE_VALIDATE The enum constant names should all have a prefix that matches the enum name. VIR_DOMAIN_CHECKPOINT_REDEFINE_VALIDATE was missing the "CREATE_" part of the name prefix. Reviewed-by: Peter Krempa <pkrempa> Signed-off-by: Daniel P. Berrangé <berrange> commit 5ab8cc78c4ab026338070409196fe14ee56a0d04 Author: Peter Krempa <pkrempa> Date: Wed Nov 4 13:37:35 2020 +0100 qemu: backup: Add partial validation of incremental backup checkpoint Verify that the checkpoint requested by an incremental backup exists. Unfortunately validating whether the checkpoint configuration actually matches the disk may not be reasonably feasible as the disk may have been renamed/snapshotted/etc. We still rely on bitmap presence. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit a4d4d2bd5deb3efaa45a5a3386f681363a54a6e5 Author: Peter Krempa <pkrempa> Date: Wed Nov 4 10:16:02 2020 +0100 qemu: checkpoint: Implement VIR_DOMAIN_CHECKPOINT_REDEFINE_VALIDATE Validate that the bitmaps are present when redefining a checkpoint. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit 93873c9bcc4cba575616d977eb00acf73a6f5403 Author: Peter Krempa <pkrempa> Date: Wed Nov 4 10:10:56 2020 +0100 conf: checkpoint: Split virDomainCheckpointRedefinePrep into two functions First one prepares and validates the definition, the second one actually either updates an existing checkpoint or assigns definition for the new one. This will allow driver code to add extra validation between those steps. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit f37d306f6e02a469bd79293f8431df9ca3034744 Author: Peter Krempa <pkrempa> Date: Tue Nov 3 12:49:11 2020 +0100 virsh: checkpoint-create: Add support for VIR_DOMAIN_CHECKPOINT_REDEFINE_VALIDATE Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit 9b54eb84c8a25ed1b71d9d1cc0e65a630915da30 Author: Peter Krempa <pkrempa> Date: Tue Nov 3 12:28:46 2020 +0100 checkpoint: Introduce VIR_DOMAIN_CHECKPOINT_REDEFINE_VALIDATE flag Introduce a flag which will allow users to perform hypervisor-specific validation when redefining the checkpoint metadata. This will allow checking metadata which is stored e.g. in disk images when populating the libvirt metadata. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit e33e89d8399042f55fa6999fdfaa40ee884a9bfe Author: Peter Krempa <pkrempa> Date: Tue Nov 3 12:00:21 2020 +0100 qemu: backup: Use VIR_ERR_CHECKPOINT_INCONSISTENT when starting a backup If we don't have a consistent chain of bitmaps for the backup to proceed we'd report VIR_ERR_INVALID_ARG error code, which makes it hard to decide whether an incremental backup makes even sense. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit facfa8262ecab062229228b8d3662b391384ebe1 Author: Peter Krempa <pkrempa> Date: Tue Nov 3 11:57:58 2020 +0100 error: Introduce VIR_ERR_CHECKPOINT_INCONSISTENT error code This code will be used to signal cases when the checkpoint is broken either during backup or other operations where a user might want to make decision based on the presence of the checkpoint, such as do a full backup instead of an incremental one. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit af7047717f83a7f75c5ae3b318dc4d44b81627fd Author: Peter Krempa <pkrempa> Date: Tue Nov 3 12:33:55 2020 +0100 man: virsh: Mention that '--size' for 'checkpoint-dumpxml' may require running vm Separate the docs for the '--size' flag into its own paragraph and mention that the domain may be required to be running. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit 1bb33357eaaa36e12b31c794c603d27f6ae03835 Author: Peter Krempa <pkrempa> Date: Tue Nov 3 12:30:53 2020 +0100 checkpoint: Mention that VIR_DOMAIN_CHECKPOINT_XML_SIZE may require running vm The qemu implementation requires that the VM associated with the checkpoint is running when checking the size. Mention this possibility with the flag. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko>
PreVerified with libvirt-6.10.0-1.fc34.x86_64 Result: PASS ➜ fedora virsh start pc Domain 'pc' started ➜ fedora virsh domblklist pc Target Source --------------------------------------------- vda /var/lib/libvirt/images/pc.qcow2 vdb /var/lib/libvirt/images/vdb.qcow2 ➜ fedora virsh checkpoint-list pc Name Creation Time ----------------------- ➜ fedora cat ck1.xml <domaincheckpoint> <name>ck1</name> <disks> <disk name='vda' checkpoint='no'/> <disk name='vdb' checkpoint='bitmap'/> </disks> </domaincheckpoint> ➜ fedora virsh checkpoint-create pc ck1.xml Domain checkpoint ck1 created from 'ck1.xml' ➜ fedora virsh checkpoint-dumpxml pc ck1 --no-domain > created_ck.xml ➜ fedora cat created_ck.xml <domaincheckpoint> <name>ck1</name> <creationTime>1610078860</creationTime> <disks> <disk name='vda' checkpoint='no'/> <disk name='vdb' checkpoint='bitmap' bitmap='ck1'/> </disks> </domaincheckpoint> ➜ fedora virsh checkpoint-list pc Name Creation Time ----------------------------------- ck1 2021-01-08 04:07:40 +0000 ➜ fedora virsh destroy pc Domain 'pc' destroyed ➜ fedora qemu-img create -f qcow2 /var/lib/libvirt/images/vdb.qcow2 100M Formatting '/var/lib/libvirt/images/vdb.qcow2', fmt=qcow2 cluster_size=65536 extended_l2=off compression_type=zlib size=104857600 lazy_refcounts=off refcount_bits=16 ➜ fedora virsh start pc Domain 'pc' started ➜ fedora virsh checkpoint-delete pc ck1 --metadata Domain checkpoint ck1 deleted ➜ fedora virsh checkpoint-create pc created_ck.xml --redefine --redefine-validate error: checkpoint inconsistent: missing or broken bitmap 'ck1' for disk 'vdb'
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098