Bug 1305793

Summary: After failed external snapshot successive VM operations fail
Product: Red Hat Enterprise Linux 6 Reporter: Peter Krempa <pkrempa>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 6.8CC: dyuan, hhan, mzhan, pkrempa, rbalakri, virt-bugs, xuzhang, yanyang
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: libvirt-0.10.2-57.el6 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1304579 Environment:
Last Closed: 2016-05-10 19:26:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1304579    
Bug Blocks:    

Description Peter Krempa 2016-02-09 09:05:34 UTC
+++ This bug was initially created as a clone of Bug #1304579 +++

Description of problem:
As summary

Version-Release number of selected component (if applicable):
libvirt-0.10.2-56.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.487.el6.x86_64
libvirt-lock-sanlock-0.10.2-56.el6.x86_64
sanlock-2.8-2.el6_5.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Setup libvirt sanlock on a host:
On host A
# setsebool sanlock_use_nfs 1 && setsebool virt_use_nfs 1 && setsebool virt_use_sanlock 1
# cat /etc/libvirt/qemu-sanlock.conf
auto_disk_leases = 1
disk_lease_dir = "/var/lib/libvirt/sanlock"
host_id = 1
user = "sanlock"
group = "sanlock"

# cat /etc/libvirt/qemu.conf
lock_manager = "sanlock"

# cat /etc/sysconfig/sanlock                                              
SANLOCKOPTS="-w 0"

# service wdmd restart; service sanlock restart; service libvirtd restart

2. Create a guest and do external snapshot
# cat guest.xml
<domain type='kvm' id='1'>
...
    <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/var/lib/libvirt/images/c2.qcow2'/>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>
...
</domain>

# virsh create guest.xml
Domain cc created from guest.xml
# virsh list 
 Id    Name                           State
----------------------------------------------------
 6     cc                             running

# virsh snapshot-create-as cc s1 --disk-only --diskspec vda,file=/tmp/cc.s1                                                    
error: Failed to acquire lock: File exists
Snapshot file created but the snapshot failed.
# ll /tmp/cc.s1
-rw-------. 1 qemu qemu 2.2M Feb  4 10:17 /tmp/cc.s1
# virsh snapshot-list cc
 Name                 Creation Time             State
------------------------------------------------------------

When I rm the file and create it again:
# rm /tmp/cc.s1;virsh snapshot-create-as cc s1 --disk-only --diskspec vda,file=/tmp/cc.s1;
error: Timed out during operation: cannot acquire state change lock

At this point no other operation on the given VM is possible.

Comment 4 Yang Yang 2016-02-18 06:47:43 UTC
Peter,
After failed external system check point snapshot, vm is not resumed. Is it acceptable result?

verified on libvirt-0.10.2-57.el6.x86_64

steps as following
1. Setup libvirt sanlock on a host:
On host A
# setsebool sanlock_use_nfs 1 && setsebool virt_use_nfs 1 && setsebool virt_use_sanlock 1
# cat /etc/libvirt/qemu-sanlock.conf
auto_disk_leases = 1
disk_lease_dir = "/var/lib/libvirt/sanlock"
host_id = 1
user = "sanlock"
group = "sanlock"

# cat /etc/libvirt/qemu.conf
lock_manager = "sanlock"

# cat /etc/sysconfig/sanlock                                              
SANLOCKOPTS="-w 0"

mount nfs server on /mnt
# mount
10.73.194.27:/vol/S3/libvirtauto/yy on /mnt

# service wdmd restart; service sanlock restart; service libvirtd restart

2. start a guest 
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/rhel6.qcow2'>
        <seclabel model='selinux' relabel='no'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>

3. create external disk only snapshot
# virsh snapshot-create-as yy s1 --disk-only --diskspec vda,file=/var/lib/libvirt/images/yy.s1
error: Failed to acquire lock: File exists

# ll /var/lib/libvirt/images/yy.s1
ls: cannot access /var/lib/libvirt/images/yy.s1: No such file or directory

# virsh dumpxml yy | grep disk -a6
 <disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/rhel6.qcow2'>
        <seclabel model='selinux' relabel='no'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>

4. create external disk only snapshot once more
# virsh snapshot-create-as yy s1 --disk-only --diskspec vda,file=/var/lib/libvirt/images/yy.s1
error: Failed to acquire lock: File exists

###At this point domain job is unlocked###

5. create external system check point snapshot
# virsh snapshot-create-as yy s1 --diskspec vda,file=/mnt/yy.s1 --memspec file=/mnt/yy.mem
error: Failed to acquire lock: File exists

# virsh dumpxml yy | grep disk -a6
<disk type='file' device='disk'>
      <driver name='qemu' type='qcow2' cache='none'/>
      <source file='/mnt/yy.s1'>
        <seclabel model='selinux' relabel='no'/>
      </source>
      <target dev='vda' bus='virtio'/>
      <alias name='virtio-disk0'/>
      <address type='pci' domain='0x0000' bus='0x00' slot='0x06' function='0x0'/>
    </disk>

# virsh snapshot-list yy
 Name                 Creation Time             State
------------------------------------------------------------

# virsh list
 Id    Name                           State
----------------------------------------------------
 7     yy                             paused

###At this point vm is not resumed after failed snapshot###

Comment 5 Peter Krempa 2016-02-18 06:57:53 UTC
(In reply to yangyang from comment #4)
> Peter,
> After failed external system check point snapshot, vm is not resumed. Is it
> acceptable result?

Yes, that is expected. The failure was triggered by not being able to resume the VM due to a failed locking attempt. At that point the VM can be either restarted in the future, or killed via the 'destroy' api.

Comment 6 Yang Yang 2016-02-18 07:16:06 UTC
Thank Peter's quick response. Move it to verified

Comment 8 errata-xmlrpc 2016-05-10 19:26:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-0738.html