Bug 1817327 - [incremental_backup] VM will be locked if we start a backup job with dirty-bitmap name conflicting
Summary: [incremental_backup] VM will be locked if we start a backup job with dirty-bi...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.2
Hardware: x86_64
OS: Linux
high
medium
Target Milestone: rc
: 8.0
Assignee: Peter Krempa
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks: 1799015
TreeView+ depends on / blocked
 
Reported: 2020-03-26 06:55 UTC by yisun
Modified: 2020-05-05 09:59 UTC (History)
6 users (show)

Fixed In Version: libvirt-6.0.0-16.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-05-05 09:59:00 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
libvirtd-debug.log (1.61 MB, text/plain)
2020-03-26 06:56 UTC, yisun
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2020:2017 0 None None None 2020-05-05 09:59:43 UTC

Description yisun 2020-03-26 06:55:46 UTC
Description:
VM will be locked if we start a backup job with dirty-bitmap name conflicting

Versions:
libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64
qemu-kvm-4.2.0-16.module+el8.2.0+6092+4f2391c1.x86_64

How reproducible:
100%


0. Clear libvirtd log 
[root@dell-per740xd-11 inc_bkup]# echo "" > /var/log/libvirtd-debug.log

2. Create a checkpoint for vm, named “check_full”
[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-create-as vm1 check_full
Domain checkpoint check_full created

3. Delete the checkpoint metadata of libvirt
[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-delete vm1 check_full --metadata
Domain checkpoint check_full deleted

4. Start a backup. The checkpoint xml has checkpoint name = ‘check_full’, same as step 2
[root@dell-per740xd-11 inc_bkup]# virsh backup-begin vm1 backup_full_pull.xml checkpoint_full_pull.xml
error: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: check_full
<==== Expected error happened, since the dirty bitmap name already exists in qcow2 file.

5. Operate the vm, nothing can be done, it’s locked…
[root@dell-per740xd-11 inc_bkup]# virsh domjobinfo vm1
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBackupBegin)

[root@dell-per740xd-11 inc_bkup]# virsh destroy vm1
error: Disconnected from qemu:///system due to keepalive timeout
error: Failed to destroy domain vm1
error: internal error: connection closed due to keepalive timeout

6. Log uploaded as attachment

Actual result:
When libvirt’s checkpoint metadata deleted, and do another backup job with name conflicting,  expected error happened, but vm locked. 

Expected result:
VM should not be locked.

Comment 1 yisun 2020-03-26 06:56:52 UTC
Created attachment 1673689 [details]
libvirtd-debug.log

Comment 2 yisun 2020-03-26 07:09:05 UTC
here is the checkpoint and backup xml:
[root@dell-per740xd-11 inc_bkup]# cat checkpoint_full_pull.xml
<domaincheckpoint>
  <name>check_full</name>
  <disks>
    <disk name='vda' checkpoint='bitmap'/>
  </disks>
</domaincheckpoint>

[root@dell-per740xd-11 inc_bkup]# cat backup_full_pull.xml
<domainbackup mode='pull'>
  <server name="localhost" port="10809"/>
  <disks>
    <disk name='vda' backup='yes' type='file'>
	    <scratch file='/mnt/sratch.vda'/>
    </disk>
  </disks>
</domainbackup>

Comment 6 Peter Krempa 2020-03-26 17:13:17 UTC
Fixed upstream:

commit e060b0624d1b78438b759cc5a25da87b28c9736c
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 26 15:37:44 2020 +0100

    qemuBackupBegin: Fix monitor access when rolling back due to failure
    
    The code attempting to clean up after a failed pull mode backup job
    wrongly entered monitor but didn't clean up nor exit monitor due to a
    logic bug. Fix the condition.
    
    Introduced in a1521f84a53

Comment 10 yisun 2020-04-02 03:41:45 UTC
Verified with libvirt-6.0.0-16.module+el8.2.0+6139+d66dece5.x86_64 and result is PASS

[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-create-as vm1 check_full
Domain checkpoint check_full created
[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-delete vm1 check_full --metadata
Domain checkpoint check_full deleted

[root@dell-per740xd-11 inc_bkup]# virsh backup-begin vm1 backup_full_pull.xml checkpoint_full_pull.xml
error: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: check_full

[root@dell-per740xd-11 inc_bkup]# virsh domjobinfo vm1
Job type:         None

[root@dell-per740xd-11 inc_bkup]# virsh destroy vm1
Domain vm1 destroyed

[root@dell-per740xd-11 inc_bkup]# virsh start vm1
Domain vm1 started

[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-list vm1
 Name   Creation Time
-----------------------

<==== not locked at any point

Comment 12 errata-xmlrpc 2020-05-05 09:59:00 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017


Note You need to log in before you can comment on or make changes to this bug.