Bug 1937598 - [incremental_backup]libvirt should release lock held by remoteDispatchDomainBackupBegin after guest destroyed
Summary: [incremental_backup]libvirt should release lock held by remoteDispatchDomainB...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux Advanced Virtualization
Classification: Red Hat
Component: libvirt
Version: 8.4
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: rc
: 8.4
Assignee: Peter Krempa
QA Contact: yisun
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-03-11 05:47 UTC by yafu
Modified: 2021-05-25 06:49 UTC (History)
8 users (show)

Fixed In Version: libvirt-7.0.0-9.el8
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-05-25 06:48:26 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description yafu 2021-03-11 05:47:14 UTC
Description of problem:
libvirt should release lock held by remoteDispatchDomainBackupBegin after guest destroyed

Version-Release number of selected component (if applicable):
libvirt-daemon-7.0.0-8.module+el8.4.0+10233+8b7fd9eb.x86_64

How reproducible:
100%

Steps to Reproduce:
1.Prepare a 'pull' mode backup xml:
#cat backup_pull_full.xml
<domainbackup mode='pull'>
  <server name="localhost" port="10809"/>
  <disks>
    <disk name='vda' backup='yes' type='file'>
    <scratch file='/mnt/sratch.vda'/>
    </disk>
  </disks>
</domainbackup>

2.Start backup with 'pull' mode:
#virsh backup-begin vm1 backup_full_pull.xml 
Backup started

3.Check domain job info:
# virsh domjobinfo vm1
Job type:         Unbounded   
Operation:        Backup      
Time elapsed:     16612        ms
Temporary disk space use: 21.375 MiB
Temporary disk space total: 10.000 GiB

4.Destroy guest:
# virsh destroy vm1
Domain 'vm1' destroyed

5.Check domain job info:
# virsh domjobinfo vm1 --completed 
Job type:         None        

6.Start guest again:
# virsh start vm1
error: Failed to start domain 'vm1'
error: Timed out during operation: cannot acquire state change lock (held by monitor=remoteDispatchDomainBackupBegin)


Actual results:
libvirt not release lock held by remoteDispatchDomainBackupBegin after guest destroyed

Expected results:
libvirt should release lock held by remoteDispatchDomainBackupBegin after guest destroyed

Additional info:
1.The issue also exits when destroy guest by 'kill -9 `pidof qemu-kvm`'
2.The issue can not reproduce with libvirt-6.0.0-25.5.module+el8.2.1+8680+ea98947b.x86_64

Comment 1 yisun 2021-03-12 04:10:06 UTC
not reproduced on
libvirt-7.0.0-6.module+el8.4.0+10144+c3d3c217.x86_64

Comment 2 Peter Krempa 2021-03-12 15:16:28 UTC
Fixed upstream by:

commit 55d175c073b15c039337e46b81c3cef907e55e7b
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 11 16:18:50 2021 +0100

    qemuBackupJobTerminate: Fix job termination for inactive VMs
    
    Commit cb29e4e801d didn't take into account that the VM can be inactive
    when it's destroyed. This means that the job would remain active also
    when the VM became inactive.
    
    To fix this properly:
    
    1) Remove the bogus VM liveness check and early return
        (reverts the aforementioned commit)
    
    2) Conditionalize the stats assignment only when the stats object is
       present
        (properly fix the crash when VM dies when reconnecting)
    
    3) end the asyncjob only when it was already set
       (prevent corruption of priv->jobs_queued)
    
    Resolves: https://bugzilla.redhat.com/show_bug.cgi?id=1937598
    Fixes: cb29e4e801d
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

commit aa372e5a0115ef94d55193e4fd85f622213e225c
Author: Peter Krempa <pkrempa>
Date:   Thu Mar 11 16:14:17 2021 +0100

    backup: Store 'apiFlags' in private section of virDomainBackupDef
    
    'qemuBackupJobTerminate' needs the API flags to see whether
    VIR_DOMAIN_BACKUP_BEGIN_REUSE_EXTERNAL. Unfortunately when called via
    qemuProcessReconnect()->qemuProcessStop() early (e.g. if the qemu
    process died while we were reconnecting) the job is cleared temporarily
    so that other APIs can be called. This would mean that we couldn't clean
    up the files in some cases.
    
    Save the 'apiFlags' inside the backup object and set it from the
    'qemuDomainJobObj' 'apiFlags' member when reconnecting to a VM.
    
    Signed-off-by: Peter Krempa <pkrempa>
    Reviewed-by: Ján Tomko <jtomko>

Comment 6 Jeff Nelson 2021-03-12 16:43:56 UTC
Exception approved in review meeting on 12 Mar 2021.

Comment 8 yisun 2021-03-15 09:35:34 UTC
Verfied on: libvirt-7.0.0-9.module+el8.4.0+10326+5e50a3b6.x86_64
result: PASS

[root@dell-per740-01 ~]# cat backup.xml 
<domainbackup mode='pull'>
  <server name="localhost" port="10809"/>
  <disks>
    <disk name='vda' backup='yes' type='file'>
    <scratch file='/tmp/sratch.vda'/>
    </disk>
  </disks>
</domainbackup>

[root@dell-per740-01 ~]# virsh backup-begin vm1 backup.xml 
Backup started

[root@dell-per740-01 ~]# virsh destroy vm1
Domain 'vm1' destroyed

[root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed
Job type:         Cancelled   
Operation:        Backup      


[root@dell-per740-01 ~]# virsh backup-begin vm1 backup.xml 
Backup started

[root@dell-per740-01 ~]# virsh list
 Id   Name   State
----------------------
 3    vm1    running

[root@dell-per740-01 ~]# ps -ef | grep vm1
qemu      188112       1 61 05:23 ?        00:00:29 /usr/libexec/qemu-kvm -name guest=vm1...

[root@dell-per740-01 ~]# kill -9 188112

[root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed
Job type:         Cancelled   
Operation:        Backup      

[root@dell-per740-01 ~]# virsh start vm1
Domain 'vm1' started

Comment 9 yisun 2021-03-15 14:43:44 UTC
push mode:
[root@dell-per740-01 ~]# cat push.xml 
<domainbackup>
  <disks>
    <disk name='vda' type='file'>
      <target file='/tmp/vda.backup'/>
      <driver type='qcow2'/>
    </disk>
  </disks>
</domainbackup>


[root@dell-per740-01 ~]# virsh backup-begin vm1 push.xml 
Backup started

[root@dell-per740-01 ~]# virsh destroy vm1
Domain 'vm1' destroyed

[root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed
Job type:         Cancelled   
Operation:        Backup      

[root@dell-per740-01 ~]# virsh start vm1
Domain 'vm1' started

[root@dell-per740-01 ~]# virsh backup-begin vm1 push.xml 
Backup started

[root@dell-per740-01 ~]# ps -ef | grep vm1
qemu      196827       1 99 10:40 ?        00:00:22 /usr/libexec/qemu-kvm -name guest=vm1..


[root@dell-per740-01 ~]# kill -9 196827

[root@dell-per740-01 ~]# virsh domjobinfo vm1 --completed
Job type:         Cancelled   
Operation:        Backup      

[root@dell-per740-01 ~]# virsh start vm1
Domain 'vm1' started

Comment 12 errata-xmlrpc 2021-05-25 06:48:26 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (virt:av bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:2098


Note You need to log in before you can comment on or make changes to this bug.