Bug 1799011 - incremental-backup: RFE: Handle backup bitmaps during live migration with shared storage
Summary: incremental-backup: RFE: Handle backup bitmaps during live migration with sha...
Keywords:
Status: VERIFIED
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: 8.0
Assignee: Peter Krempa
QA Contact: yisun
URL:
Whiteboard:
Depends On: 1858739 1207659
Blocks: 1799015 1139877 1861680
TreeView+ depends on / blocked
 
Reported: 2020-02-06 13:21 UTC by Peter Krempa
Modified: 2020-11-05 19:35 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Feature Request
Target Upstream Version:


Attachments (Terms of Use)

Description Peter Krempa 2020-02-06 13:21:01 UTC
Description of problem:
Libvirt currently doesn't handle bitmaps used to drive the incremental backup feature during live migration. Incremental backup would not be possible after a migration without migrating the active bitmaps.

Comment 1 Nir Soffer 2020-02-11 15:18:39 UTC
Until this is fixed, RHV need to either:
- disable live migration functionality if incremental backup is enabled
- delete all checkpoints before live migration, forcing the next backup to be a full backup

Comment 2 Eyal Shenitzky 2020-07-29 08:38:58 UTC
Peter, according to what you said on our meetings this issue should be fixed right?

Comment 3 Peter Krempa 2020-07-29 09:04:48 UTC
Success code paths of migrations should work at this point. There's a qemu bug which might prevent resuming the source if migration fails in certain phases (unlikely): https://bugzilla.redhat.com/show_bug.cgi?id=1858739

In this bug I want to test it and potentially implement code to transport bitmaps over the migration stream which is supposed to be more efficient.

Comment 4 Eyal Shenitzky 2020-08-03 10:16:57 UTC
(In reply to Peter Krempa from comment #3)
> Success code paths of migrations should work at this point. There's a qemu
> bug which might prevent resuming the source if migration fails in certain
> phases (unlikely): https://bugzilla.redhat.com/show_bug.cgi?id=1858739
> 
> In this bug I want to test it and potentially implement code to transport
> bitmaps over the migration stream which is supposed to be more efficient.

Peter, can you please clarify what exactly should work?

Does a VM that contains checkpoints and bitmaps should migrate successfully?
New incremental backup can be taken after the VM was live migrated?

Comment 5 Peter Krempa 2020-08-03 10:26:22 UTC
(In reply to Eyal Shenitzky from comment #4)
> (In reply to Peter Krempa from comment #3)
> > Success code paths of migrations should work at this point. There's a qemu
> > bug which might prevent resuming the source if migration fails in certain
> > phases (unlikely): https://bugzilla.redhat.com/show_bug.cgi?id=1858739
> > 
> > In this bug I want to test it and potentially implement code to transport
> > bitmaps over the migration stream which is supposed to be more efficient.
> 
> Peter, can you please clarify what exactly should work?
> 
> Does a VM that contains checkpoints and bitmaps should migrate successfully?
> New incremental backup can be taken after the VM was live migrated?

Yes, both cases should work. Note that the checkpoint definitions are not migrated at this point, so need to be redefined, but since oVirt is already doing that when starting the VM this should not be a problem.

Comment 6 Peter Krempa 2020-08-07 14:06:25 UTC
I've filed multiple bugs dealing with specifics sub-issues of this. I'm also re-purposing this bug to specifically track migration of bitmaps when shared storage migration is used.

 - https://bugzilla.redhat.com/show_bug.cgi?id=1867086
   incremental backup: RFE: migrate bitmaps over qemu's migration stream also for shared-storage migration

 - https://bugzilla.redhat.com/show_bug.cgi?id=1867085
   incremental backup: RFE: Migrate bitmaps when migrating VM with --copy-storage-all

 - https://bugzilla.redhat.com/show_bug.cgi?id=1867084
  incremental backup: RFE: Add mechanism to migrate checkpoint definitions with VM

Since the bitmaps are for now migrated by qemu and every other problem is tracked. I'll set this bug as test-only.

Comment 7 yisun 2020-09-14 02:59:07 UTC
Per comment6, will remove "NewFeature" tag since it's a TESTONLY issue now. And storage migration part will be tracked in separated RFEs

Comment 8 yisun 2020-09-25 07:38:27 UTC
Verified with:
qemu-kvm-5.1.0-9.module+el8.3.0+8182+ac9ced32.x86_64
libvirt-6.6.0-6.module+el8.3.0+8125+aefcf088.x86_64

Result: 
PASS

Steps:
1. source and target hosts have nfs mounted dir:
[root@dell-per740xd-08 backup_test]# mount | grep nfs
10.66.85.212:/home/images on /backup_test type nfs 

======== FROM HERE, OPERATE ON SOURCE HOST ========
2. prepare a qcow2 image for test in nfs dir
[root@dell-per740xd-08 backup_test]# qemu-img create -f qcow2 /backup_test/test.qcow2 1G
Formatting '/backup_test/test.qcow2', fmt=qcow2 cluster_size=65536 compression_type=zlib size=1073741824 lazy_refcounts=off refcount_bits=16

3. use it in vm on source host:
[root@dell-per740xd-10 ~]# virsh start vm1
Domain vm1 started

[root@dell-per740xd-10 ~]# virsh domblklist vm1
 Target   Source
-----------------------------------
 vda      /backup_test/test.qcow2

4. create a checkpoint of vm on source host
# virsh checkpoint-create-as vm1 --description test --name ck1 --diskspec vda,checkpoint=bitmap
[root@dell-per740xd-10 ~]# virsh checkpoint-create-as vm1 --description test --name ck1 --diskspec vda,checkpoint=bitmap
Domain checkpoint ck1 created

[root@dell-per740xd-10 ~]# virsh checkpoint-list vm1
 Name   Creation Time
-----------------------------------
 ck1    2020-09-25 03:15:50 -0400


5. dumpxml checkpoint ck1 to a local file, and scp it to target hos
# virsh checkpoint-dumpxml vm1 ck1 > ck1_dumpxml.xml

[root@dell-per740xd-10 ~]# scp ck1_dumpxml.xml root@dell-per740xd-08.lab.eng.pek2.redhat.com:/tmp/
Warning: Permanently added 'dell-per740xd-08.lab.eng.pek2.redhat.com' (ECDSA) to the list of known hosts.
root@dell-per740xd-08.lab.eng.pek2.redhat.com's password: 
ck1_dumpxml.xml                                                                                                                                                                                                                                                                             100% 6985     6.2MB/s   00:00  


6. migrate vm to target host
[root@dell-per740xd-10 ~]# virsh migrate vm1 --live qemu+ssh://dell-per740xd-08.lab.eng.pek2.redhat.com/system --verbose --unsafe
root@dell-per740xd-08.lab.eng.pek2.redhat.com's password: 
Migration: [100 %]

======== FROM HERE, OPERATE ON TARGET HOST ========
7. check vm1 has no checkpoint metadata
[root@dell-per740xd-08 backup_test]# virsh checkpoint-list vm1
 Name   Creation Time
-----------------------

8. create checkpoint metadata with the xml generated in step 5
[root@dell-per740xd-08 backup_test]# virsh checkpoint-create vm1 /tmp/ck1_dumpxml.xml --redefine
Domain checkpoint ck1 created from '/tmp/ck1_dumpxml.xml'

9. prepare incremental backup xml files
[root@dell-per740xd-08 ~]# cat bk.xml 
<domainbackup mode='push'>
  <incremental>ck1</incremental>
  <disks>
    <disk name='vda' backup='yes' type='file'>
	    <target file='/tmp/vda.inc.backup'/>
	    <driver type='qcow2'/>
    </disk>
  </disks>
</domainbackup>

[root@dell-per740xd-08 ~]# cat ck.xml 
<domaincheckpoint>
  <name>ck2</name>
  <disks>
    <disk name='vda' checkpoint='bitmap'/>
  </disks>
</domaincheckpoint>

10. Start an incremental backup for vm1, from checkpoint=ck1.
[root@dell-per740xd-08 ~]# virsh backup-begin vm1 bk.xml ck.xml 
Backup started
  
[root@dell-per740xd-08 ~]# virsh domjobinfo vm1 --completed
Job type:         Completed   
Operation:        Backup      
Time elapsed:     54           ms

11. Check backup file actually created
[root@dell-per740xd-08 ~]# qemu-img info /tmp/vda.inc.backup 
image: /tmp/vda.inc.backup
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
disk size: 448 KiB
cluster_size: 65536
backing file: /backup_test/test.qcow2
backing file format: qcow2
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    refcount bits: 16
    corrupt: false

12. check the vdb image in nfs dir now having 2 checkpoints ['ck1', 'ck2']
[root@dell-per740xd-08 ~]# virsh destroy vm1
Domain vm1 destroyed

[root@dell-per740xd-08 ~]# qemu-img info /backup_test/test.qcow2 -U
image: /backup_test/test.qcow2
file format: qcow2
virtual size: 1 GiB (1073741824 bytes)
disk size: 396 KiB
cluster_size: 65536
Format specific information:
    compat: 1.1
    compression type: zlib
    lazy refcounts: false
    bitmaps:
        [0]:
            flags:
                [0]: auto
            name: ck1
            granularity: 65536
        [1]:
            flags:
                [0]: auto
            name: ck2
            granularity: 65536
    refcount bits: 16
    corrupt: false


Note You need to log in before you can comment on or make changes to this bug.