Description: VM will be locked if we start a backup job with dirty-bitmap name conflicting Versions: libvirt-6.0.0-14.module+el8.2.0+6069+78a1cb09.x86_64 qemu-kvm-4.2.0-16.module+el8.2.0+6092+4f2391c1.x86_64 How reproducible: 100% 0. Clear libvirtd log [root@dell-per740xd-11 inc_bkup]# echo "" > /var/log/libvirtd-debug.log 2. Create a checkpoint for vm, named “check_full” [root@dell-per740xd-11 inc_bkup]# virsh checkpoint-create-as vm1 check_full Domain checkpoint check_full created 3. Delete the checkpoint metadata of libvirt [root@dell-per740xd-11 inc_bkup]# virsh checkpoint-delete vm1 check_full --metadata Domain checkpoint check_full deleted 4. Start a backup. The checkpoint xml has checkpoint name = ‘check_full’, same as step 2 [root@dell-per740xd-11 inc_bkup]# virsh backup-begin vm1 backup_full_pull.xml checkpoint_full_pull.xml error: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: check_full <==== Expected error happened, since the dirty bitmap name already exists in qcow2 file. 5. Operate the vm, nothing can be done, it’s locked… [root@dell-per740xd-11 inc_bkup]# virsh domjobinfo vm1 error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBackupBegin) [root@dell-per740xd-11 inc_bkup]# virsh destroy vm1 error: Disconnected from qemu:///system due to keepalive timeout error: Failed to destroy domain vm1 error: internal error: connection closed due to keepalive timeout 6. Log uploaded as attachment Actual result: When libvirt’s checkpoint metadata deleted, and do another backup job with name conflicting, expected error happened, but vm locked. Expected result: VM should not be locked.
Created attachment 1673689 [details] libvirtd-debug.log
here is the checkpoint and backup xml: [root@dell-per740xd-11 inc_bkup]# cat checkpoint_full_pull.xml <domaincheckpoint> <name>check_full</name> <disks> <disk name='vda' checkpoint='bitmap'/> </disks> </domaincheckpoint> [root@dell-per740xd-11 inc_bkup]# cat backup_full_pull.xml <domainbackup mode='pull'> <server name="localhost" port="10809"/> <disks> <disk name='vda' backup='yes' type='file'> <scratch file='/mnt/sratch.vda'/> </disk> </disks> </domainbackup>
Fixed upstream: commit e060b0624d1b78438b759cc5a25da87b28c9736c Author: Peter Krempa <pkrempa> Date: Thu Mar 26 15:37:44 2020 +0100 qemuBackupBegin: Fix monitor access when rolling back due to failure The code attempting to clean up after a failed pull mode backup job wrongly entered monitor but didn't clean up nor exit monitor due to a logic bug. Fix the condition. Introduced in a1521f84a53
Verified with libvirt-6.0.0-16.module+el8.2.0+6139+d66dece5.x86_64 and result is PASS [root@dell-per740xd-11 inc_bkup]# virsh checkpoint-create-as vm1 check_full Domain checkpoint check_full created [root@dell-per740xd-11 inc_bkup]# virsh checkpoint-delete vm1 check_full --metadata Domain checkpoint check_full deleted [root@dell-per740xd-11 inc_bkup]# virsh backup-begin vm1 backup_full_pull.xml checkpoint_full_pull.xml error: internal error: unable to execute QEMU command 'transaction': Bitmap already exists: check_full [root@dell-per740xd-11 inc_bkup]# virsh domjobinfo vm1 Job type: None [root@dell-per740xd-11 inc_bkup]# virsh destroy vm1 Domain vm1 destroyed [root@dell-per740xd-11 inc_bkup]# virsh start vm1 Domain vm1 started [root@dell-per740xd-11 inc_bkup]# virsh checkpoint-list vm1 Name Creation Time ----------------------- <==== not locked at any point
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2017