1817327 – [incremental_backup] VM will be locked if we start a backup job with dirty-bitmap name conflicting

Bug 1817327 - [incremental_backup] VM will be locked if we start a backup job with dirty-bitmap name conflicting

Summary: [incremental_backup] VM will be locked if we start a backup job with dirty-bi...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Linux Advanced Virtualization
Classification:	Red Hat
Component:	libvirt
Sub Component:
Version:	8.2
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	medium
Target Milestone:	rc
Target Release:	8.0
Assignee:	Peter Krempa
QA Contact:	yisun
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1799015
TreeView+	depends on / blocked

Reported:	2020-03-26 06:55 UTC by yisun
Modified:	2020-05-05 09:59 UTC (History)
CC List:	6 users (show)
Fixed In Version:	libvirt-6.0.0-16.el8
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-05-05 09:59:00 UTC
Type:	Bug
Target Upstream Version:
Embargoed:
Dependent Products:
Flags:	pm-rhel: mirror+

Attachments	(Terms of Use)
libvirtd-debug.log (1.61 MB, text/plain) 2020-03-26 06:56 UTC, yisun	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2020:2017	0	None	None	None	2020-05-05 09:59:43 UTC

Description yisun 2020-03-26 06:55:46 UTC

Description:
VM will be locked if we start a backup job with dirty-bitmap name conflicting

Versions:
libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64
qemu-kvm-4.2.0-16.module+el8.2.0+6092+4f2391c1.x86_64

How reproducible:
100%

0. Clear libvirtd log
[root@dell-per740xd-11 inc_bkup]# echo "" > /var/log/libvirtd-debug.log

2. Create a checkpoint for vm, named “check_full”
[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-create-as vm1 check_full
Domain checkpoint check_full created

3. Delete the checkpoint metadata of libvirt
[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-delete vm1 check_full --metadata
Domain checkpoint check_full deleted

4. Start a backup. The checkpoint xml has checkpoint name = ‘check_full’, same as step 2
[root@dell-per740xd-11 inc_bkup]# virsh backup-begin vm1 backup_full_pull.xml checkpoint_full_pull.xml
error: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: check_full
<==== Expected error happened, since the dirty bitmap name already exists in qcow2 file.

5. Operate the vm, nothing can be done, it’s locked…
[root@dell-per740xd-11 inc_bkup]# virsh domjobinfo vm1
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBackupBegin)

[root@dell-per740xd-11 inc_bkup]# virsh destroy vm1
error: Disconnected from qemu:///system due to keepalive timeout
error: Failed to destroy domain vm1
error: internal error: connection closed due to keepalive timeout

6. Log uploaded as attachment

Actual result:
When libvirt’s checkpoint metadata deleted, and do another backup job with name conflicting, expected error happened, but vm locked.

Expected result:
VM should not be locked.

Comment 1 yisun 2020-03-26 06:56:52 UTC

Created attachment 1673689 [details]
libvirtd-debug.log

Comment 2 yisun 2020-03-26 07:09:05 UTC

here is the checkpoint and backup xml:
[root@dell-per740xd-11 inc_bkup]# cat checkpoint_full_pull.xml
<domaincheckpoint>
  <name>check_full</name>
  <disks>
    <disk name='vda' checkpoint='bitmap'/>
  </disks>
</domaincheckpoint>

[root@dell-per740xd-11 inc_bkup]# cat backup_full_pull.xml
<domainbackup mode='pull'>
  <server name="localhost" port="10809"/>
  <disks>
    <disk name='vda' backup='yes' type='file'>
	    <scratch file='/mnt/sratch.vda'/>
    </disk>
  </disks>
</domainbackup>

Comment 6 Peter Krempa 2020-03-26 17:13:17 UTC

Fixed upstream:

commit e060b0624d1b78438b759cc5a25da87b28c9736c
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 26 15:37:44 2020 +0100

    qemuBackupBegin: Fix monitor access when rolling back due to failure
    
    The code attempting to clean up after a failed pull mode backup job
    wrongly entered monitor but didn't clean up nor exit monitor due to a
    logic bug. Fix the condition.
    
    Introduced in a1521f84a53

Comment 10 yisun 2020-04-02 03:41:45 UTC

Verified with libvirt-6.0.0-16.module+el8.2.0+6139+d66dece5.x86_64 and result is PASS

[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-create-as vm1 check_full
Domain checkpoint check_full created
[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-delete vm1 check_full --metadata
Domain checkpoint check_full deleted

[root@dell-per740xd-11 inc_bkup]# virsh backup-begin vm1 backup_full_pull.xml checkpoint_full_pull.xml
error: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: check_full

[root@dell-per740xd-11 inc_bkup]# virsh domjobinfo vm1
Job type:         None

[root@dell-per740xd-11 inc_bkup]# virsh destroy vm1
Domain vm1 destroyed

[root@dell-per740xd-11 inc_bkup]# virsh start vm1
Domain vm1 started

[root@dell-per740xd-11 inc_bkup]# virsh checkpoint-list vm1
 Name   Creation Time
-----------------------

<==== not locked at any point

Comment 12 errata-xmlrpc 2020-05-05 09:59:00 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:2017

Note You need to log in before you can comment on or make changes to this bug.