Bug 1829829
Summary: | [incremental backup] Creating incremental backup that includes a new VM disk that requires full backup is impossible | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux Advanced Virtualization | Reporter: | Eyal Shenitzky <eshenitz> | ||||
Component: | libvirt | Assignee: | Peter Krempa <pkrempa> | ||||
Status: | CLOSED ERRATA | QA Contact: | yisun | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 8.2 | CC: | aliang, dyuan, jdenemar, lmen, lmiksik, pkrempa, virt-maint, xuzhang | ||||
Target Milestone: | rc | Keywords: | Triaged | ||||
Target Release: | 8.3 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | libvirt-6.6.0-1.el8 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2020-11-17 17:48:34 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1861674 | ||||||
Attachments: |
|
As outlined in the private email thread that lead to filing this bug, there are multiple possible options how to tackle this problem: 1) Allow a full disk backup of the new disk to be included in an incremental backup 1a) by having a global per backup job flag to enable full backup for any disk where the bitmaps are not present 1b) by having a per-disk flag to notify that this disk is allowed to have a full backup (doesn't fail if the disk has the appropriate checkpoints, just creates incremental backup) 1c) by having a per-disk flag to force a full backup (full backup even if checkpoints are present) 1d) by having a per-disk-flag to force a full backup and report errors if checkpoints are present 2) allow more than one checkpoint as a source for the incremental backup This one may be problematic due to the design we chose for the hierarchy of backups. I'll need to think about it. Additionally even 1) has a similar possible set of issues though. The new disk which is not part of the previous checkpoint might cause problems when deleting checkpoints and other situations e.g. when taking backup starting from an earlier checkpoint. Thus please state what would be the best solution for you and then I can start designing how to overcome the design limitations we have. Libvirt now allows configuring the checkpoint for incremental backup and also backup mode per disk. This can be used to force a full backup for an individualdisk or select a different checkpoint to backup from while creating a common checkpoint for any of them. Committed upstream as: commit 7e5b993d3b8cae9c43b753591a7b12db5c540da5 Author: Peter Krempa <pkrempa> Date: Thu Jun 25 16:16:14 2020 +0200 backup: Allow configuring incremental backup per-disk individually The semantics of the backup operation don't strictly require that all disks being backed up are part of the same incremental part (when a disk was checkpointed/backed up separately or in a different VM), or even they may not have a previous checkpoint at all (e.g. when the disk was freshly hotplugged to the vm). In such cases we can still create a common checkpoint for all of them and backup differences according to configuration. This patch adds a per-disk configuration of the checkpoint to do the incremental backup from via the 'incremental' attribute and allows perform full backups via the 'backupmode' attribute. Note that no changes to the qemu driver are necessary to take advantage of this as we already obey the per-disk 'incremental' field. https://bugzilla.redhat.com/show_bug.cgi?id=1829829 Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> commit 6f33e441e236dd738fd7776bf062cc546835226c Author: Peter Krempa <pkrempa> Date: Tue Jul 7 16:38:00 2020 +0200 backupxml2xmltest: Call 'virDomainBackupAlignDisks' before formatting output Call the post-processing function so that we can validate that it does the correct thing. virDomainBackupAlignDisks requires disk definitions to be present so let's fake them by copying disks from the backup definition and add one extra disk 'vdextradisk'. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> commit 1a7ce56ae12c71f4a28fb9578240630dc9afee49 Author: Peter Krempa <pkrempa> Date: Tue Jul 7 16:52:50 2020 +0200 virDomainBackupDiskDefFormat: Format internal disk state only when valid Format the disk state only when it isn't _NONE. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> commit 5c08a3739b7c1d45c35bf403451cbb5f35f4fa58 Author: Peter Krempa <pkrempa> Date: Tue Jul 7 16:56:16 2020 +0200 backupxml2xmltest: Remove output symlink of 'backup-pull-internal-invalid' Replace the output by a copy of the input file for further changes once we start testing virDomainBackupAlignDisks. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Eric Blake <eblake> : Is the fix for this bug will introduce a API change? the checkpoint/backup XML will be changed? If not can you please explain in a high level what will be the change? It's achieved using two new XML attributes for the backup disk in the backup XML. I suggest you look at the commit which also adds documentation of the new fields: https://gitlab.com/libvirt/libvirt/-/commit/7e5b993d3b8cae9c43b753591a7b12db5c540da5 Test with: libvirt-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64 qemu-kvm-5.1.0-7.module+el8.3.0+8099+dba2fe3e.x86_64 Result: PASS Pls note: Only pull mode backup test result recorded here, since the pull mode and push mode will enter same code path. Push mode and negative test scenarios will be put in our test case document. ================================================== Script to dump nbd export to local file ================================================== [root@dell-per740xd-10 ~]# cat copyif3.sh copyif2() { map_from="--image-opts driver=nbd,export=$4,server.type=inet" map_from+=",server.host=127.0.0.1,server.port=10809" map_from+=",x-dirty-bitmap=qemu:dirty-bitmap:$3" state=false qemu-img rebase -u -f qcow2 -F raw -b $1 $2 while read line; do [[ $line =~ .*start.:.([0-9]*).*length.:.([0-9]*).*data.:.$state.* ]] || continue start=${BASH_REMATCH[1]} len=${BASH_REMATCH[2]} echo echo " $start $len:" qemu-io -C -c "r $start $len" -f qcow2 $2 done < <(qemu-img map --output=json $map_from) qemu-img rebase -u -f qcow2 -b '' $2 } copyif2 $1 $2 $3 $4 ================================================== Script to make a disk chain for some qcow2 files ================================================== [root@dell-per740xd-10 images]# cat make_chain.py import sys import os import shutil import subprocess USAGE = """ ==================================================================== Usage: python make_chain.py file1 file2 file3 ... Script will make a copy of each file, and make the copies as a chain: cp_file1 <- cp_file2 <- cp_file3 ... This means cp_file3 based on cp_file2 based on cp_file1 ==================================================================== """ def check_files(img_files): for img_file in (img_files): if not os.path.exists(img_file): exit(("File '%s' not existing" % img_file) + "\n" + USAGE) def copy_files(img_files): img_copies = [] for img_file in img_files: new_img_file = img_file + ".copy" if os.path.exists(new_img_file): os.remove(new_img_file) print("File '%s' deleted" % new_img_file) shutil.copyfile(img_file, new_img_file) img_copies.append(new_img_file) return img_copies def make_chain(img_files): cmd_pattern = "qemu-img rebase -u -f qcow2 -b %s -F qcow2 %s" chain = "" for i in range(len(img_files)): if i == 0: chain += img_files[i] continue cmd = cmd_pattern % (img_files[i-1], img_files[i]) print("Making Chain: '%s' based on '%s'" % (img_files[i], img_files[i-1])) status = subprocess.call(cmd,shell=True) if status: exit() chain += " <=== %s" % img_files[i] print("\nNOW THE CHAIN IS:\n" "======================================================================\n" "\t%s\n" "======================================================================" % chain) if len(sys.argv) <= 2: exit(USAGE) img_files = sys.argv[1:] check_files(img_files) img_copies = copy_files(img_files) make_chain(img_copies) ================================================== Test steps: ================================================== 1. prepare images for vdb and vdc [root@dell-per740xd-10 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/vdb.qcow2 100M Formatting '/var/lib/libvirt/images/vdb.qcow2', fmt=qcow2 cluster_size=65536 compression_type=zlib size=104857600 lazy_refcounts=off refcount_bits=16 [root@dell-per740xd-10 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/vdc.qcow2 200M Formatting '/var/lib/libvirt/images/vdc.qcow2', fmt=qcow2 cluster_size=65536 compression_type=zlib size=209715200 lazy_refcounts=off refcount_bits=16 2. make sure vm1 already has vdb and we'll attach vdc to it [root@dell-per740xd-10 ~]# virsh domblklist vm1 Target Source -------------------------------------------------------- vda /var/lib/libvirt/images/jeos-27-x86_64.qcow2 vdb /var/lib/libvirt/images/vdb.qcow2 3. prepare vdc disk xml [root@dell-per740xd-10 ~]# cat vdc.xml <disk type='file' device='disk'> <driver name='qemu' type='qcow2'/> <source file='/var/lib/libvirt/images/vdc.qcow2'/> <target dev='vdc' bus='virtio'/> </disk> 4. Generete some data in vm's vdb, and get one line data from hexdump [root@localhost ~]# dd if=/dev/urandom of=/dev/vdb bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 0.0950622 s, 110 MB/s [root@localhost ~]# hexdump /dev/vdb | head -1 0000000 fd6a 31ce 6c5b d0f3 b5ea b26a 62b3 23ff 5. start full backup for vdb [root@dell-per740xd-10 ~]# cat ck1.xml <domaincheckpoint> <name>ck1</name> <disks> <disk name='vda' checkpoint='no'/> <disk name='vdb' checkpoint='bitmap'/> </disks> </domaincheckpoint> [root@dell-per740xd-10 ~]# cat bk1.xml <domainbackup mode='pull'> <server name="localhost" port="10809"/> <disks> <disk name='vda' backup='no'/> <disk name='vdb' backup='yes' type='file'> <scratch file='/tmp/vdb_scratch'/> </disk> </disks> </domainbackup> [root@dell-per740xd-10 ~]# virsh backup-begin vm1 bk1.xml ck1.xml Backup started [root@dell-per740xd-10 ~]# qemu-img convert -f raw nbd://localhost:10809/vdb -O qcow2 /var/lib/libvirt/images/vdb.full.backup 6. attach vdc to disk [root@dell-per740xd-10 ~]# virsh attach-device vm1 vdc.xml Device attached successfully 7. generate some new data for vdb and vdc [root@localhost ~]# dd if=/dev/urandom of=/dev/vdb bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 0.0893557 s, 117 MB/s [root@localhost ~]# hexdump /dev/vdb | head -1 0000000 03e1 2d26 9d85 0f43 7fa8 692c ad35 b8e4 [root@localhost ~]# dd if=/dev/urandom of=/dev/vdc bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 0.0896498 s, 117 MB/s [root@localhost ~]# hexdump /dev/vdc | head -1 0000000 f66c 2cf9 388d ea59 e97d 3934 0ef5 cecd 8. prepare checkpoint and backup xml for the 2nd round backup [root@dell-per740xd-10 ~]# cat ck2.xml <domaincheckpoint> <name>ck2</name> <disks> <disk name='vda' checkpoint='no'/> <disk name='vdb' checkpoint='bitmap'/> <disk name='vdc' checkpoint='bitmap'/> </disks> </domaincheckpoint> [root@dell-per740xd-10 ~]# cat bk2.xml <domainbackup mode='pull'> <incremental>ck1</incremental> <server name="localhost" port="10809"/> <disks> <disk name='vda' backup='no'/> <disk name='vdb' backup='yes' type='file'> <scratch file='/tmp/vdb_scratch'/> </disk> <disk name='vdc' backup='yes' type='file' backupmode='full'> <scratch file='/tmp/vdc_scratch'/> </disk> </disks> </domainbackup> 9. start 2nd round backup [root@dell-per740xd-10 ~]# virsh backup-begin vm1 bk2.xml ck2.xml Backup started 10. save the backuped data and abort backup job vdb_inc1: [root@dell-per740xd-10 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/vdb.inc1.backup 100M Formatting '/var/lib/libvirt/images/vdb.inc1.backup', fmt=qcow2 cluster_size=65536 compression_type=zlib size=104857600 lazy_refcounts=off refcount_bits=16 [root@dell-per740xd-10 ~]# ./copyif3.sh nbd://127.0.0.1:10809/vdb /var/lib/libvirt/images/vdb.inc1.backup backup-vdb vdb 0 10485760: read 10485760/10485760 bytes at offset 0 10 MiB, 1 ops; 00.03 sec (371.044 MiB/sec and 37.1044 ops/sec) vdc_full: [root@dell-per740xd-10 ~]# qemu-img convert -f raw nbd://localhost:10809/vdc -O qcow2 /var/lib/libvirt/images/vdc.full.backup [root@dell-per740xd-10 ~]# virsh domjobabort vm1 11. generate some new data for vdb and vdc [root@localhost ~]# dd if=/dev/urandom of=/dev/vdb bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 0.0827228 s, 127 MB/s [root@localhost ~]# hexdump /dev/vdb | head -1 0000000 2e47 afbd 106f 989e 3a81 535d edaf 74f5 [root@localhost ~]# dd if=/dev/urandom of=/dev/vdc bs=1M count=10 10+0 records in 10+0 records out 10485760 bytes (10 MB, 10 MiB) copied, 0.082781 s, 127 MB/s [root@localhost ~]# hexdump /dev/vdc | head -1 0000000 8299 aaea c9fc f1b6 6907 6202 f8ca 702c 12. prepare checkpoint and backup xml for 3rd round backup [root@dell-per740xd-10 ~]# cat bk3.xml <domainbackup mode='pull'> <server name="localhost" port="10809"/> <disks> <disk name='vda' backup='no'/> <disk name='vdb' backup='yes' type='file' backupmode='incremental' incremental='ck2'> <scratch file='/tmp/vdb_scratch'/> </disk> <disk name='vdc' backup='yes' type='file' incremental='ck2'> <scratch file='/tmp/vdc_scratch'/> </disk> </disks> </domainbackup> [root@dell-per740xd-10 ~]# cat ck3.xml <domaincheckpoint> <name>ck3</name> <disks> <disk name='vda' checkpoint='no'/> <disk name='vdb' checkpoint='bitmap'/> <disk name='vdc' checkpoint='bitmap'/> </disks> </domaincheckpoint> 13. start 3rd round backup [root@dell-per740xd-10 ~]# virsh backup-begin vm1 bk3.xml ck3.xml Backup started 14. save the backuped data vdb incremental backup 2: [root@dell-per740xd-10 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/vdb.inc2.backup 100M Formatting '/var/lib/libvirt/images/vdb.inc2.backup', fmt=qcow2 cluster_size=65536 compression_type=zlib size=104857600 lazy_refcounts=off refcount_bits=16 [root@dell-per740xd-10 ~]# ./copyif3.sh nbd://127.0.0.1:10809/vdb /var/lib/libvirt/images/vdb.inc2.backup backup-vdb vdb 0 10485760: read 10485760/10485760 bytes at offset 0 10 MiB, 1 ops; 00.03 sec (357.202 MiB/sec and 35.7202 ops/sec) vdc incremental backup 1: [root@dell-per740xd-10 ~]# qemu-img create -f qcow2 /var/lib/libvirt/images/vdc.inc1.backup 200M Formatting '/var/lib/libvirt/images/vdc.inc1.backup', fmt=qcow2 cluster_size=65536 compression_type=zlib size=209715200 lazy_refcounts=off refcount_bits=16 [root@dell-per740xd-10 ~]# ./copyif3.sh nbd://127.0.0.1:10809/vdc /var/lib/libvirt/images/vdc.inc1.backup backup-vdc vdc 0 10485760: read 10485760/10485760 bytes at offset 0 10 MiB, 1 ops; 00.03 sec (345.078 MiB/sec and 34.5078 ops/sec) 15. make disk chain for vdb and vdc backup files and check their data is correct 15.1 make disk chain for vdb backup files [root@dell-per740xd-10 images]# python make_chain.py vdb.full.backup vdb.inc1.backup vdb.inc2.backup Making Chain: 'vdb.inc1.backup.copy' based on 'vdb.full.backup.copy' Making Chain: 'vdb.inc2.backup.copy' based on 'vdb.inc1.backup.copy' NOW THE CHAIN IS: ====================================================================== vdb.full.backup.copy <=== vdb.inc1.backup.copy <=== vdb.inc2.backup.copy ====================================================================== 15.2 make disk chain for vdc backup files [root@dell-per740xd-10 images]# python make_chain.py vdc.full.backup vdc.inc1.backup Making Chain: 'vdc.inc1.backup.copy' based on 'vdc.full.backup.copy' NOW THE CHAIN IS: ====================================================================== vdc.full.backup.copy <=== vdc.inc1.backup.copy ====================================================================== 15.2 check vdb data using vdb.inc2.backup.copy data of hexdump: [root@localhost ~]# hexdump /dev/vdb | head -1 0000000 2e47 afbd 106f 989e 3a81 535d edaf 74f5 <== same as step 11 data using vdb.inc1.backup.copy data of hexdump: [root@localhost ~]# hexdump /dev/vdb | head -1 0000000 03e1 2d26 9d85 0f43 7fa8 692c ad35 b8e4 <== same as step 7 using vdb.full.backup.copy data of hexdump: [root@localhost ~]# hexdump /dev/vdb | head -1 0000000 fd6a 31ce 6c5b d0f3 b5ea b26a 62b3 23ff <== same as step 4 15.3 check vdc data using: vdc.inc1.backup.copy data of hexdump: [root@localhost ~]# hexdump /dev/vdc | head -1 0000000 8299 aaea c9fc f1b6 6907 6202 f8ca 702c <== same as step 11 using: vdc.full.backup.copy [root@localhost ~]# hexdump /dev/vdc | head -1 0000000 f66c 2cf9 388d ea59 e97d 3934 0ef5 cecd <== same as step 7 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:8.3 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:5137 |
Created attachment 1683301 [details] libvirt logs Description of problem: Libvirt supports only one checkpoint_id per backup for creating an incremental backup, this behavior prevents from including new disks that were added after the backup was taken (full or incremental). It causes the user to maintain separated chains of backups for each newly added disks for the VM. For example - Create a VM with an OS disk and run it. Perform a full backup for the disk. Add a new disk for the VM Start an incremental backup that includes both disks using the given checkpoint ID that was created in step 2. The operation fails with the following error - "Error starting backup: invalid argument: failed to find bitmap \'0d25d274-35f1-4c33-b9b8-f5894540206c\' in image \'sdd6\'"}' Version-Release number of selected component (if applicable): libvirt-6.1.0-2 qemu-4.2.0-7 How reproducible: 100% Steps to Reproduce (oVirt operations): 1. Create a VM with a disk 2. Start the VM 3. Create a full backup for the disk -> checkpoint '123' created 4. Stop the backup 5. Attach a new disk for the VM. 6. Start a new backup that includes both disks using the created checkpoint ID from step 3 Actual results: Mixed incremental/full backup operation failed Expected results: Libvirt should provide a way to unify the backup creation of newly added disks Additional info: