Description of problem: libvirt should release lock held by remoteDispatchDomainBackupBegin after guest destroyed Version-Release number of selected component (if applicable): libvirt-daemon-7.0.0-8.module+el8.4.0+10233+8b7fd9eb.x86_64 How reproducible: 100% Steps to Reproduce: 1.Prepare a 'pull' mode backup xml: #cat backup_pull_full.xml <domainbackup mode='pull'> <server name="localhost" port="10809"/> <disks> <disk name='vda' backup='yes' type='file'> <scratch file='/mnt/sratch.vda'/> </disk> </disks> </domainbackup> 2.Start backup with 'pull' mode: #virsh backup-begin vm1 backup_full_pull.xml Backup started 3.Check domain job info: # virsh domjobinfo vm1 Job type: Unbounded Operation: Backup Time elapsed: 16612 ms Temporary disk space use: 21.375 MiB Temporary disk space total: 10.000 GiB 4.Destroy guest: # virsh destroy vm1 Domain 'vm1' destroyed 5.Check domain job info: # virsh domjobinfo vm1 --completed Job type: None 6.Start guest again: # virsh start vm1 error: Failed to start domain 'vm1' error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBackupBegin) Actual results: libvirt not release lock held by remoteDispatchDomainBackupBegin after guest destroyed Expected results: libvirt should release lock held by remoteDispatchDomainBackupBegin after guest destroyed Additional info: 1.The issue also exits when destroy guest by 'kill -9 `pidof qemu-kvm`' 2.The issue can not reproduce with libvirt-6.0.0-25.5.module+el8.2.1+8680+ea98947b.x86_64
not reproduced on libvirt-7.0.0-6.module+el8.4.0+10144+c3d3c217.x86_64
Fixed upstream by: commit 55d175c073b15c039337e46b81c3cef907e55e7b Author: Peter Krempa <pkrempa> Date: Thu Mar 11 16:18:50 2021 +0100 qemuBackupJobTerminate: Fix job termination for inactive VMs Commit cb29e4e801d didn't take into account that the VM can be inactive when it's destroyed. This means that the job would remain active also when the VM became inactive. To fix this properly: 1) Remove the bogus VM liveness check and early return (reverts the aforementioned commit) 2) Conditionalize the stats assignment only when the stats object is present (properly fix the crash when VM dies when reconnecting) 3) end the asyncjob only when it was already set (prevent corruption of priv->jobs_queued) Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1937598 Fixes: cb29e4e801d Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko> commit aa372e5a0115ef94d55193e4fd85f622213e225c Author: Peter Krempa <pkrempa> Date: Thu Mar 11 16:14:17 2021 +0100 backup: Store 'apiFlags' in private section of virDomainBackupDef 'qemuBackupJobTerminate' needs the API flags to see whether VIR_DOMAIN_BACKUP_BEGIN_REUSE_EXTERNAL. Unfortunately when called via qemuProcessReconnect()->qemuProcessStop() early (e.g. if the qemu process died while we were reconnecting) the job is cleared temporarily so that other APIs can be called. This would mean that we couldn't clean up the files in some cases. Save the 'apiFlags' inside the backup object and set it from the 'qemuDomainJobObj' 'apiFlags' member when reconnecting to a VM. Signed-off-by: Peter Krempa <pkrempa> Reviewed-by: Ján Tomko <jtomko>
Exception approved in review meeting on 12 Mar 2021.
Verfied on: libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64 result: PASS [root@dell-per740-01 ~]# cat backup.xml <domainbackup mode='pull'> <server name="localhost" port="10809"/> <disks> <disk name='vda' backup='yes' type='file'> <scratch file='/tmp/sratch.vda'/> </disk> </disks> </domainbackup> [root@dell-per740-01 ~]# virsh backup-begin vm1 backup.xml Backup started [root@dell-per740-01 ~]# virsh destroy vm1 Domain 'vm1' destroyed [root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed Job type: Cancelled Operation: Backup [root@dell-per740-01 ~]# virsh backup-begin vm1 backup.xml Backup started [root@dell-per740-01 ~]# virsh list Id Name State ---------------------- 3 vm1 running [root@dell-per740-01 ~]# ps -ef | grep vm1 qemu 188112 1 61 05:23 ? 00:00:29 /usr/libexec/qemu-kvm -name guest=vm1... [root@dell-per740-01 ~]# kill -9 188112 [root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed Job type: Cancelled Operation: Backup [root@dell-per740-01 ~]# virsh start vm1 Domain 'vm1' started
push mode: [root@dell-per740-01 ~]# cat push.xml <domainbackup> <disks> <disk name='vda' type='file'> <target file='/tmp/vda.backup'/> <driver type='qcow2'/> </disk> </disks> </domainbackup> [root@dell-per740-01 ~]# virsh backup-begin vm1 push.xml Backup started [root@dell-per740-01 ~]# virsh destroy vm1 Domain 'vm1' destroyed [root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed Job type: Cancelled Operation: Backup [root@dell-per740-01 ~]# virsh start vm1 Domain 'vm1' started [root@dell-per740-01 ~]# virsh backup-begin vm1 push.xml Backup started [root@dell-per740-01 ~]# ps -ef | grep vm1 qemu 196827 1 99 10:40 ? 00:00:22 /usr/libexec/qemu-kvm -name guest=vm1.. [root@dell-per740-01 ~]# kill -9 196827 [root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed Job type: Cancelled Operation: Backup [root@dell-per740-01 ~]# virsh start vm1 Domain 'vm1' started
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2021:2098