Hide Forgot
Description of problem: When a vm is holding a lease, taking a live snaphost fail trying to acquire again an acquired lease. oVirt is adding the option of having a lease per vm, which is not attached to particular disk. See bug 1317429, and the feature page http://www.ovirt.org/develop/release-management/features/storage/vm-leases/ When using such lease, we expect that the lease will be acquired by libvirt when starting the vm, and releaed when the vm stop. Once the lease is aqcuired, we expect libvirt to avoid modfying the lease in any way. Looking at the attached log, we can see that libvirt is trying to add the lease again when performing a snapshot, and fail when sanlock reports that the lease exsits. This issue prevent usage of live snapshot and live storage migration, features which are required for HA vms that need VM leases. 1. Strating a vm with a lease 2016-12-12 07:24:20.018+0000: 2091: debug : virDomainCreateXML:170 : conn=0x7f9498002680, xmlDesc=<?xml version='1.0' encoding='UTF-8'?> <domain xmlns:ovirt="http://ovirt.org/vm/tune/1.0" type="kvm"><name>vm-leases</name><uuid>3c93acc4-0365-4937-8fc8-aaf564e275af</uuid><memory>1048576</memory><currentMemory>1048576</currentMemory><maxMemory slots="16">4294967296</maxMemory><vcpu current="2">16</vcpu><devices><channel type="unix"><target name="com.redhat.rhevm.vdsm" type="virtio" /><source mode="bind" path="/var/lib/libvirt/qemu/channels/3c93acc4-0365-4937-8fc8-aaf564e275af.com.redhat.rhevm.vdsm" /></channel><channel type="unix"><target name="org.qemu.guest_agent.0" type="virtio" /><source mode="bind" path="/var/lib/libvirt/qemu/channels/3c93acc4-0365-4937-8fc8-aaf564e275af.org.qemu.guest_agent.0" /></channel><input bus="ps2" type="mouse" /><memballoon model="virtio" /><rng model="virtio"><backend model="random">/dev/urandom</backend></rng><controller index="0" model="virtio-scsi" type="scsi" /><controller index="0" ports="16" type="virtio-serial" /><video><model heads="1" ram="65536" type="qxl" vgamem="16384" vram="8192" /></video><graphics autoport="yes" defaultMode="secure" passwd="*****" passwdValidTo="1970-01-01T00:00:01" port="-1" tlsPort="-1" type="spice"><channel mode="secure" name="main" /><channel mode="secure" name="inputs" /><channel mode="secure" name="cursor" /><channel mode="secure" name="playback" /><channel mode="secure" name="record" /><channel mode="secure" name="display" /><channel mode="secure" name="smartcard" /><channel mode="secure" name="usbredir" /><listen network="vdsm-ovirtmgmt" type="network" /></graphics><interface type="bridge"><mac address="00:1a:4a:16:01:51" /><model type="virtio" /><source bridge="ovirtmgmt" /><filterref filter="vdsm-no-mac-spoofing" /><link state="up" /><boot order="2" /><bandwidth /></interface><disk device="cdrom" snapshot="no" type="file"><source file="" startupPolicy="optional" /><target bus="ide" dev="hdc" /><readonly /></disk><disk device="disk" snapshot="no" type="block"><address bus="0" controller="0" target="0" type="drive" unit="0" /><source dev="/rhev/data-center/fe307b9e-8f6b-4958-955a-0faeeae8b017/eeb8e812-4e69-469a-a07a-272ea3a79105/images/ccce66b4-3b74-4fa4-b887-5b7e2be0b828/0ed14102-5198-43f1-bb71-1eb593fd6aef" /><target bus="scsi" dev="sda" /><serial>ccce66b4-3b74-4fa4-b887-5b7e2be0b828</serial><boot order="1" /><driver cache="none" error_policy="stop" io="native" name="qemu" type="qcow2" /></disk><lease><key>3c93acc4-0365-4937-8fc8-aaf564e275af</key><lockspace>eeb8e812-4e69-469a-a07a-272ea3a79105</lockspace><target offset="3145728" path="/dev/eeb8e812-4e69-469a-a07a-272ea3a79105/xleases" /></lease><channel type="spicevmc"><target name="com.redhat.spice.0" type="virtio" /></channel></devices><metadata><ovirt:qos /></metadata><os><type arch="x86_64" machine="pc-i440fx-rhel7.3.0">hvm</type><smbios mode="sysinfo" /></os><sysinfo type="smbios"><system><entry name="manufacturer">oVirt</entry><entry name="product">oVirt Node</entry><entry name="version">7.3-7.el7</entry><entry name="serial">77A14806-F954-4D2A-AEA6-C098A4469927</entry><entry name="uuid">3c93acc4-0365-4937-8fc8-aaf564e275af</entry></system></sysinfo><clock adjustment="0" offset="variable"><timer name="rtc" tickpolicy="catchup" /><timer name="pit" tickpolicy="delay" /><timer name="hpet" present="no" /></clock><features><acpi /></features><cpu match="exact"><model>Conroe</model><topology cores="1" sockets="16" threads="1" /><numa><cell cpus="0,1" memory="1048576" /></numa></cpu></domain>, flags=0 2. Adding and acquiring the lease 2016-12-12 07:24:20.906+0000: 2091: debug : virDomainLockManagerAddLease:57 : Add lease /dev/eeb8e812-4e69-469a-a07a-272ea3a79105/xleases 2016-12-12 07:24:20.906+0000: 2091: debug : virLockManagerAddResource:332 : lock=0x7f94ac0051d0 type=1 name=3c93acc4-0365-4937-8fc8-aaf564e275af nparams=3 params=0x7f94b626c6a0 flags=0 2016-12-12 07:24:20.906+0000: 2091: debug : virLockManagerLogParams:91 : key=path type=string value=/dev/eeb8e812-4e69-469a-a07a-272ea3a79105/xleases 2016-12-12 07:24:20.906+0000: 2091: debug : virLockManagerLogParams:85 : key=offset type=ulong value=3145728 2016-12-12 07:24:20.906+0000: 2091: debug : virLockManagerLogParams:91 : key=lockspace type=string value=eeb8e812-4e69-469a-a07a-272ea3a79105 2016-12-12 07:24:20.906+0000: 2091: debug : virDomainLockManagerNew:151 : Adding disks 2016-12-12 07:24:20.906+0000: 2091: debug : virDomainLockManagerAddImage:90 : Add disk /rhev/data-center/fe307b9e-8f6b-4958-955a-0faeeae8b017/eeb8e812-4e69-469a-a07a-272ea3a79105/images/ccce66b4-3b74-4fa4-b887-5b7e2be0b828/0ed14102-5198-43f1-bb71-1eb593fd6aef 2016-12-12 07:24:20.906+0000: 2091: debug : virLockManagerAddResource:332 : lock=0x7f94ac0051d0 type=0 name=/rhev/data-center/fe307b9e-8f6b-4958-955a-0faeeae8b017/eeb8e812-4e69-469a-a07a-272ea3a79105/images/ccce66b4-3b74-4fa4-b887-5b7e2be0b828/0ed14102-5198-43f1-bb71-1eb593fd6aef nparams=0 params=(nil) flags=0 2016-12-12 07:24:20.906+0000: 2091: debug : virLockManagerAcquire:350 : lock=0x7f94ac0051d0 state='<null>' flags=0 action=0 fd=(nil) 2016-12-12 07:24:20.906+0000: 2091: debug : virLockManagerSanlockAcquire:989 : Acquiring object 1 2016-12-12 07:24:20.940+0000: 2091: debug : virLockManagerSanlockAcquire:1027 : Acquire completed fd=-1 3. Starting snapshot 2016-12-12 07:28:37.623+0000: 2089: debug : virDomainSnapshotCreateXML:214 : dom=0x7f94b00013b0, (VM: name=vm-leases, uuid=3c93acc4-0365-4937-8fc8-aaf564e275af), xmlDesc=<?xml version='1.0' encoding='UTF-8'?> <domainsnapshot><disks><disk name="sda" snapshot="external" type="block"><source dev="/rhev/data-center/fe307b9e-8f6b-4958-955a-0faeeae8b017/eeb8e812-4e69-469a-a07a-272ea3a79105/images/ccce66b4-3b74-4fa4-b887-5b7e2be0b828/2e93c1e5-bcd8-4187-ad65-deb11e2c5028" type="block" /></disk></disks></domainsnapshot>, flags=34 4. Attmpting to add an existing lease 2016-12-12 07:28:37.656+0000: 2089: debug : virDomainLockManagerNew:146 : Adding leases 2016-12-12 07:28:37.656+0000: 2089: debug : virDomainLockManagerAddLease:57 : Add lease /dev/eeb8e812-4e69-469a-a07a-272ea3a79105/xleases 2016-12-12 07:28:37.656+0000: 2089: debug : virLockManagerAddResource:332 : lock=0x7f94b0020410 type=1 name=3c93acc4-0365-4937-8fc8-aaf564e275af nparams=3 params=0x7f94b726e5e0 flags=0 2016-12-12 07:28:37.656+0000: 2089: debug : virLockManagerLogParams:91 : key=path type=string value=/dev/eeb8e812-4e69-469a-a07a-272ea3a79105/xleases 2016-12-12 07:28:37.656+0000: 2089: debug : virLockManagerLogParams:85 : key=offset type=ulong value=3145728 2016-12-12 07:28:37.656+0000: 2089: debug : virLockManagerLogParams:91 : key=lockspace type=string value=eeb8e812-4e69-469a-a07a-272ea3a79105 2016-12-12 07:28:37.656+0000: 2089: debug : virDomainLockManagerNew:151 : Adding disks 2016-12-12 07:28:37.656+0000: 2089: debug : virDomainLockManagerAddImage:90 : Add disk /rhev/data-center/fe307b9e-8f6b-4958-955a-0faeeae8b017/eeb8e812-4e69-469a-a07a-272ea3a79105/images/ccce66b4-3b74-4fa4-b887-5b7e2be0b828/2e93c1e5-bcd8-4187-ad65-deb11e2c5028 2016-12-12 07:28:37.656+0000: 2089: debug : virLockManagerAddResource:332 : lock=0x7f94b0020410 type=0 name=/rhev/data-center/fe307b9e-8f6b-4958-955a-0faeeae8b017/eeb8e812-4e69-469a-a07a-272ea3a79105/images/ccce66b4-3b74-4fa4-b887-5b7e2be0b828/2e93c1e5-bcd8-4187-ad65-deb11e2c5028 nparams=0 params=(nil) flags=0 2016-12-12 07:28:37.656+0000: 2089: debug : virLockManagerAcquire:350 : lock=0x7f94b0020410 state='<null>' flags=0 action=0 fd=(nil) 5. Failed to acquire an acquired lease 2016-12-12 07:28:37.656+0000: 2089: debug : virLockManagerSanlockAcquire:989 : Acquiring object 1 2016-12-12 07:28:37.656+0000: 2089: error : virLockManagerSanlockAcquire:998 : Failed to acquire lock: File exists Note that the lease is not associated with the original disk or the snapshot disk. Version: # rpm -qa | egrep 'libvirt|sanlock' libvirt-daemon-config-nwfilter-2.0.0-10.el7.x86_64 libvirt-daemon-2.0.0-10.el7.x86_64 libvirt-daemon-driver-qemu-2.0.0-10.el7.x86_64 libvirt-daemon-driver-interface-2.0.0-10.el7.x86_64 sanlock-lib-3.4.0-1.el7.x86_64 libvirt-client-2.0.0-10.el7.x86_64 libvirt-daemon-driver-nodedev-2.0.0-10.el7.x86_64 libvirt-lock-sanlock-2.0.0-10.el7.x86_64 sanlock-python-3.4.0-1.el7.x86_64 libvirt-daemon-driver-secret-2.0.0-10.el7.x86_64 libvirt-python-2.0.0-2.el7.x86_64 sanlock-3.4.0-1.el7.x86_64 libvirt-daemon-driver-nwfilter-2.0.0-10.el7.x86_64 libvirt-daemon-kvm-2.0.0-10.el7.x86_64 libvirt-daemon-driver-storage-2.0.0-10.el7.x86_64 libvirt-daemon-driver-network-2.0.0-10.el7.x86_64 Reproducible: Always Steps to reproduce: 0. Configure libvirt to use sanlock and disable auto-leases see attached configuration files 1. Start a vm with a lease device (see domain xml above) 2. Perform live snapshot
Created attachment 1230728 [details] Logs and configuration files
This is urgent and I request that it will be fixed in 7.3.z. Can you escalate? This fix is critical for the 4.1 release.
Hi, I think this bug is blocked by https://bugzilla.redhat.com/show_bug.cgi?id=1191901, which will hang when doing external snapshot after sanlock lease.
(In reply to Han Han from comment #7) > Hi, I think this bug is blocked by > https://bugzilla.redhat.com/show_bug.cgi?id=1191901, which will hang when > doing external snapshot after sanlock lease. Han, why do you think this depends on the other bug? Here there is no deadlock, libvirt simply fail to acquire the lock. Also here we use very different configuration: # vim /etc/libvirt/qemu-sanlock.conf auto_disk_leases = 0 It is possible that both bugs are caused by the same root cause, trying to modify an acquired lease, but I don't see any dependency.
Well, you're right. They are just similar testing scenarios, no dependency. I will add the bug to 'See Also'.
Han, can you reproduce this bug with virsh, in the same way you reproduced bug 1191901? I guess it will be useful for testing.
Patch fixing this particular locking problem posted upstream: https://www.redhat.com/archives/libvir-list/2016-December/msg00619.html
Fixed upstream: commit 4b951d1e38259ff5d03e9eedb65095eead8099e1 Author: Peter Krempa <pkrempa> Date: Wed Dec 14 08:01:34 2016 +0100 qemu: snapshot: Don't attempt to resume cpus if they were not paused External disk-only snapshots with recent enough qemu don't require libvirt to pause the VM. The logic determining when to resume cpus was slightly flawed and attempted to resume them even if they were not paused by the snapshot code. This normally was not a problem, but with locking enabled the code would attempt to acquire the lock twice. The fallout of this bug would be a error from the API, but the actual snapshot being created. The bug was introduced with when adding support for external snapshots with memory (checkpoints) in commit f569b87.
Created attachment 1234263 [details] bug reproduce I reproduced the bug on RHEL7.3 by virsh. Steps(Pls refer to script on attachment for detail): 1. Setup configurations for libvirt snalock 2. Restart libvirtd and sanlock services 3. Manually create locking files 4. Add <lease>..</lease> element in VM's xml 5. Start VM and try to create external snapshot
Created attachment 1234264 [details] libvirtd log Hi, Peter. I test your scratch build as last comment and find another error: # virsh snapshot-create-as n1 s1 --disk-only error: unsupported configuration: Read/write, exclusive access, disks were present, but no leases specified Pls check it
I tested the build from comment 22 - works for me. Tested with vdsm from this patch: https://gerrit.ovirt.org/65465/ Each vm was started with one lease device using the vm id as key, and a sanlock resource on block/file storage. Positive flows: - live migration - live snapshot - live snapshot with memory - live storage migration - live merge - hot plug disk - hot unplug disk Negative flows: - restarting vdsm while vms are running recovery seems ok - Blocking storage on host running vm with nfs storage sanlock terminate the vm - Blocking storage on host running vm with block storage sanlock terminate the vm - Trying to use same lease in two vms one vm starts, the other fail to start with error: VM test-v4-iscsi is down with error. Exit message: resource busy: Failed to acquire lock: error -243. Everything looks good from ovirt side. In libvirt logs there are some unexpected errors like: 2016-12-22 01:15:38.680+0000: 11094: error : virLockManagerSanlockAcquire:998 : Failed to acquire lock: File exists These errors do not seem to affect vdsm. Attaching libvirt logs for reference.
Created attachment 1234594 [details] libvirt logs from successful run with this scratch build from comment 22
Peter, I guess that this is still broken on Fedora, right? We need this fix also on Fedora 24/25. Should I open a Fedora bug for this?
(In reply to Nir Soffer from comment #25) > Peter, I guess that this is still broken on Fedora, right? > > We need this fix also on Fedora 24/25. Should I open a Fedora bug for this? Yes. For backports to fedora please file bzs for the appropriate releases.
(In reply to Han Han from comment #20) > Created attachment 1234264 [details] > libvirtd log > > Hi, Peter. I test your scratch build as last comment and find another error: > # virsh snapshot-create-as n1 s1 --disk-only > error: unsupported configuration: Read/write, exclusive access, disks were > present, but no leases specified You need to set 'require_lease_for_disks = 0' in the locking config file for this particular case.
One additional patch fixes a slight problem with qemu remaining paused after a snapshot with the --live option. commit 2e86c0816fc8ab573745f1a9a650be09bd66e300 Author: Peter Krempa <pkrempa> Date: Wed Jan 4 13:23:31 2017 +0100 qemu: snapshot: Resume VM after live snapshot Commit 4b951d1e38259ff5d03e9eedb65095eead8099e1 missed the fact that the VM needs to be resumed after a live external checkpoint (memory snapshot) where the cpus would be paused by the migration rather than libvirt.
That's usually how Z-stream works. 1406765 was already released in 7.3.z, the release for 7.4 was not done yet.
Verify it on: libvirt-3.2.0-3.virtcov.el7.x86_64 sanlock-3.5.0-1.el7.x86_64 qemu-kvm-rhev-2.9.0-2.el7.x86_64 1. Set configurations as following: In qemu-sanlock.conf: user = "sanlock" group = "sanlock" require_lease_for_disks = 0 host_id = 1 auto_disk_leases = 0 disk_lease_dir = "/var/lib/libvirt/sanlock" In qemu.conf: lock_manager = "sanlock" 2. Create file for locksapce and resource: # DOM=V-lock # lockspace_name=libvirt-sanlock # lockspce_resource_path=/var/lib/libvirt/sanlock/libvirt-sanlock # resource_name=test-disk-resource-lock # resource_offset=1048576 # truncate -s 2M $lockspce_resource_path # chown sanlock:sanlock $lockspce_resource_path # sanlock direct init -s $lockspace_name:0:$lockspce_resource_path:0 # sanlock add_lockspace -s $lockspace_name:1:$lockspce_resource_path:0 # sanlock direct init -r $lockspace_name:$resource_name:$lockspce_resource_path:$resource_offset # restorecon -R -v /var/lib/libvirt/sanlock 3. Cold plug following xml to VM # cat lease.xml <lease> <lockspace>libvirt-sanlock</lockspace> <key>test-disk-resource-lock</key> <target path='/var/lib/libvirt/sanlock/libvirt-sanlock' offset='1048576'/> </lease> # virsh attach-device $DOM ./lease.xml --config 4. Start VM and create snapshots with different options: # virsh start $DOM # virsh snapshot-create-as $DOM s1 --disk-only # virsh snapshot-create-as $DOM s2 --memspec /var/lib/libvirt/images/$DOM-mem.$i # virsh snapshot-create-as $DOM $i --memspec /var/lib/libvirt/images/$DOM-mem.$i --live # virsh snapshot-create-as $DOM $i --memspec /var/lib/libvirt/images/$DOM-mem.$i --diskspec hda,file=/var/lib/libvirt/images/$DOM.$i # virsh snapshot-create-as $DOM $i --memspec /var/lib/libvirt/images/$DOM-mem.$i --diskspec hda,file=/var/lib/libvirt/images/$DOM.$i --live VM started. All snapshots created successfully. Bug fixed.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:1846