Bug 1406765

Summary: Snapshot fail trying to add an existing sanlock lease
Product: Red Hat Enterprise Linux 7 Reporter: Marcel Kolaja <mkolaja>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Han Han <hhan>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.3CC: bmcclain, dyuan, hhan, jdenemar, jsuchane, michal.skrivanek, nsoffer, pkrempa, rbalakri, snagar, teigland, xuzhang, ylavi
Target Milestone: rcKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: libvirt-2.0.0-10.el7_3.4 Doc Type: Bug Fix
Doc Text:
Previously, when taking a live snapshot, the libvirt service attempted to acquire an already locked sanlock lease. As a consequence, the live snapshot failed to be captured. This update improves the code that tracks whether the virtual machine needs to be resumed after the snapshot. In addition, the update prevents the code acquiring the locks from being invoked if not necessary. As a result, the described problem no longer occurs.
Story Points: ---
Clone Of: 1403691
: 1415488 (view as bug list) Environment:
Last Closed: 2017-01-17 18:27:56 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1403691    
Bug Blocks: 1317429, 1408825, 1411118, 1415488    
Attachments:
Description Flags
The logs and reproducing scripts for comment11 none

Description Marcel Kolaja 2016-12-21 12:23:48 UTC
This bug has been copied from bug #1403691 and has been proposed
to be backported to 7.3 z-stream (EUS).

Comment 7 Han Han 2017-01-04 06:05:31 UTC
Verify it on libvirt-2.0.0-10.el7_3.3.x86_64:
Steps:
1. Set sanlock configuration for libvirt
# cat /etc/libvirt/qemu-sanlock.conf
user = "sanlock"
group = "sanlock"
host_id = 1
auto_disk_leases = 0
disk_lease_dir = "/var/lib/libvirt/sanlock"
require_lease_for_disks = 0

# cat /etc/libvirt/qemu.conf
lock_manager = "sanlock"

2. Create lockspace for sanlock
# truncate -s 1M /var/lib/libvirt/sanlock/TEST_LS
# sanlock direct init -s TEST_LS:0:/var/lib/libvirt/sanlock/TEST_LS:0
init done 0
# chown sanlock:sanlock /var/lib/libvirt/sanlock/TEST_LS
# systemctl start sanlock

3. Init disk lease
# truncate -s 1M /var/lib/libvirt/sanlock/test-disk-resource-lock
# sanlock direct init -r TEST_LS:test-disk-resource-lock:/var/lib/libvirt/sanlock/test-disk-resource-lock:0
init done 0
# restorecon -R -v /var/lib/libvirt/sanlock
# chown sanlock:sanlock /var/lib/libvirt/sanlock/test-disk-resource-lock
# systemctl restart libvirtd

4. Create VM and create external snapshots
# virsh create n1.xml
Domain n1 created from n1.xml
# for i in s{1..5};do
#    virsh snapshot-create-as $DOM $i --disk-only
#done
Domain snapshot s1 created
Domain snapshot s2 created
Domain snapshot s3 created
Domain snapshot s4 created
Domain snapshot s5 created

It's OK to create external snapshots with sanlock lease. Bug fixed.

Comment 8 Han Han 2017-01-04 06:08:48 UTC
Additional comment for verifying:
For guest xml, there is <lease> element:
# cat n1.xml
...
  <lease>
      <lockspace>TEST_LS</lockspace>
      <key>test-disk-resource-lock</key>
      <target path='/var/lib/libvirt/sanlock/test-disk-resource-lock'/>
    </lease>
...

Comment 9 Peter Krempa 2017-01-05 08:51:07 UTC
One additional patch is necessary. After a snapshot with memory and '--live' libvirt would not unpause the VM after the memory snapshot due to a logic bug.

This also resulted into qemu crashing on an attempt to do a internal snapshot afterwards due to a bug in qemu uncovered by this (https://bugzilla.redhat.com/show_bug.cgi?id=1408653)

http://post-office.corp.redhat.com/archives/rhvirt-patches/2017-January/msg00098.html

Comment 10 Han Han 2017-01-06 08:07:27 UTC
Verify it on libvirt-2.0.0-10.el7_3.4.x86_64:
1. Prepare sanlock env as step1~3 in comment7
2. Create following snapshots with random sequence:
external disk-only snapshot; external mem-only snapshot; external mem-only snapshot with --live; external checkpoint; external checkpoint with --live
#/bin/bash
DOM=n1
for i in s{1..50};do
    RAND=$(shuf -i 1-5 -n 1)
    case $RAND in
        1)
            virsh snapshot-create-as $DOM $i --disk-only
            ;;
        2)
            virsh snapshot-create-as $DOM $i --memspec /tmp/$DOM-mem.$i
            ;;
        3)
            virsh snapshot-create-as $DOM $i --memspec /tmp/$DOM-mem.$i --live
            ;;
        4)
            virsh snapshot-create-as $DOM $i --memspec /tmp/$DOM-mem.$i --diskspec hda,file=/var/lib/libvirt/images/$DOM.$i
            ;;
        5)
            virsh snapshot-create-as $DOM $i --memspec /tmp/$DOM-mem.$i --diskspec hda,file=/var/lib/libvirt/images/$DOM.$i --live
            ;;
    esac
done

All snapshots are created successfully.

3. Check VM status:
# virsh list 
 Id    Name                           State
----------------------------------------------------
 1     n1                             running

VM is not paused. Bug fixed.

Comment 11 Han Han 2017-01-09 08:59:55 UTC
One little issue found when testing blockcommit with sanlock auto lease on libvirt-2.0.0-10.virtcov.el7_3.4.x86_64
1. Set libvirt-sanlock configurations
# cat /etc/libvirt/qemu-sanlock.conf 
user = "sanlock"
group = "sanlock"
host_id = 1
auto_disk_leases = 1
disk_lease_dir = "/var/lib/libvirt/sanlock"
require_lease_for_disks = 1

# cat /etc/libvirt/qemu.conf        
lock_manager = "sanlock"

2. Restart services and prepare an VM
# systemctl start sanlock
# systemctl restart libvirtd
# virsh list 
 Id    Name                           State
----------------------------------------------------
 1     n2                             running


3. Create external snapshots
```
for i in s{1..5};do
    virsh snapshot-create-as $DOM $i --disk-only
done
```
After snapshots created, run blockcommit
# virsh blockcommit n2 hda --active --wait --verbose
error: Failed to acquire lock: File exists

Additional info:
For libvirt-2.0.0-10 or libvirt in RHEL7.4, it will fail when create snapshot.

Peter, pls check this issue.

Comment 12 Han Han 2017-01-09 09:04:51 UTC
Created attachment 1238585 [details]
The logs and reproducing scripts for comment11

Comment 13 Peter Krempa 2017-01-09 09:22:35 UTC
Automatic disk locking bugs are tracked by other bugs. This particular bug looks like an instance of https://bugzilla.redhat.com/show_bug.cgi?id=1302168
since the active block commit and block copy share the code paths.

Comment 15 errata-xmlrpc 2017-01-17 18:27:56 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2017-0098.html