Bug 828633

Summary: Sanlock locking failed for readonly devices
Product: [Fedora] Fedora Reporter: Mark Wu <wudxw>
Component: libvirtAssignee: Peter Krempa <pkrempa>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 17CC: abaron, acathrow, bazulay, berrange, clalancette, crobinso, dallan, dougsland, dyasny, fsimonce, iheim, itamar, jbrooks, jforbes, jyang, laine, libvirt-maint, lvroyce, mburns, mgoldboi, robert, shuming, veillard, virt-maint, ykaul
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 837659 (view as bug list) Environment:
Last Closed: 2012-06-30 18:04:31 EDT Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
Bug Depends On:    
Bug Blocks: 822145, 837659    

Description Mark Wu 2012-06-05 01:17:15 EDT
Description of problem:

When I tried to create a VM with cdrom device, I got the following error:

Thread-215::ERROR::2012-06-05 11:31:22,903::vm::608::vm.Vm::(_startUnderlyingVm) vmId=`f7c1b02f-b304-4e68-8565-3659b1214cba`::The vm start process failed
Traceback (most recent call last):
  File "/usr/share/vdsm/vm.py", line 570, in _startUnderlyingVm
    self._run()
  File "/usr/share/vdsm/libvirtvm.py", line 1364, in _run
    self._connection.createXML(domxml, flags),
  File "/usr/lib/python2.7/site-packages/vdsm/libvirtconnection.py", line 82, in wrapper
    ret = f(*args, **kwargs)
  File "/usr/lib64/python2.7/site-packages/libvirt.py", line 2420, in createXML
    if ret is None:raise libvirtError('virDomainCreateXML() failed', conn=self)
libvirtError: internal error unsupported configuration: Readonly leases are not supported

After looking into the problem, it turns out that libvirt sanlock plugin doesn't support locking for readonly/shared disks and libvirt always sets cdrom devcie as readonly.  This problem is already fixed in libvirt. It just skips readonly/shared disk for sanlock. 


Version-Release number of selected component (if applicable):
VDSM commit 706f8d9 Configure libvirt to use sanlock and later.


How reproducible:
always

Steps to Reproduce:
1. Create a VM with a cdrom device via VDSM API
  
Actual results:
Failed with the error
"libvirtError: internal error unsupported configuration: Readonly leases are not supported"

Expected results:


Additional info:
Discussion about this problem in libvirt mailing list:
https://www.redhat.com/archives/libvir-list/2012-May/msg00715.html

Fixed in the following libvirt commit
b8012ce sanlock: fix locking for readonly devices
Comment 1 Federico Simoncelli 2012-06-05 05:05:51 EDT
I don't think this is related to VDSM, could you please check that qemu-sanlock.conf is properly configured?

 # egrep -v ^# /etc/libvirt/qemu-sanlock.conf
 auto_disk_leases=0 # by vdsm-4.9.6
 require_lease_for_disks=0 # by vdsm-4.9.6

Eventually take the latest VDSM build and run:

 # service vdsmd reconfigure
Comment 2 Mark Wu 2012-06-06 03:24:41 EDT
Here's my qemu-sanlock.conf, the same as yours.
sudo  egrep -v ^# /etc/libvirt/qemu-sanlock.conf
auto_disk_leases=0
require_lease_for_disks=0

And I also run `/usr/lib/systemd/systemd-vdsmd reconfigure` to configure libvirt.
It doesn't help.  This problem happens in add a disk resource to sanlock, and should not be related to if the lease file is specified or required.
Comment 3 Royce Lv 2012-06-06 21:34:55 EDT
I also encountered same problem in my latest vdsm:vdsm-4.9.6-0.269.git3e44fe0.fc16,libvirt:libvirt-0.9.12-1.fc16.x86_64, I fix this problem by adding configure to qemu-sanlock.conf:
## beginning of configuration section by vdsm-4.9.6
auto_disk_leases=0
require_lease_for_disks=0
ignore_readonly_and_shared_disks=1
## end of configuration section by vdsm-4.9.6
It is verified in my test env,see change here:http://gerrit.ovirt.org/#/c/5057/
Comment 4 Federico Simoncelli 2012-06-07 09:30:51 EDT
(In reply to comment #3)
> I also encountered same problem in my latest
> vdsm:vdsm-4.9.6-0.269.git3e44fe0.fc16,libvirt:libvirt-0.9.12-1.fc16.x86_64,
> I fix this problem by adding configure to qemu-sanlock.conf:
> ## beginning of configuration section by vdsm-4.9.6
> auto_disk_leases=0
> require_lease_for_disks=0
> ignore_readonly_and_shared_disks=1
> ## end of configuration section by vdsm-4.9.6

This is probably a different libvirt bug since when you set require_lease_for_disks=0 the ignore_readonly_and_shared_disks value shouldn't matter. Daniel what do you think?
Comment 5 Mark Wu 2012-06-07 22:25:25 EDT
Federico,

I think it should be the same bug. The checking for readonly happens in adding resource and the requireLeaseForDisks is checked in acquiring resource. Adding resource is earlier than acquiring resource, so even ignore_readonly_and_shared_disks is set, the problem still happens.

virDomainLockProcessResume
  virDomainLockManagerNew
    virDomainLockManagerAddDisk
      virLockManagerSanlockAddResource
        ..
      ...
        if (flags & VIR_LOCK_MANAGER_RESOURCE_READONLY) {  /* Problem happens here */
            virLockError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
                     _("Readonly leases are not supported"));
            return -1;
        }

  
  virLockManagerAcquire
    ...
    virLockManagerSanlockAcquire
      ...
      if (priv->res_count == 0 &&
        priv->hasRWDisks &&
        driver->requireLeaseForDisks) {  /* requireLeaseForDisks is checked here */
        virLockError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
                     _("Read/write, exclusive access, disks were present, but no leases specified"));
        return -1;
      }   




Here's the full gdb call trace:

Breakpoint 1, virDomainLockManagerAddDisk (lock=0x7ff0bc002da0, disk=0x7ff0bc00b7b0) at locking/domain_lock.c:71
71	{
(gdb) bt
#0  virDomainLockManagerAddDisk (lock=0x7ff0bc002da0, disk=0x7ff0bc00b7b0) at locking/domain_lock.c:71
#1  0x0000003e03cfc134 in virDomainLockManagerNew (plugin=<optimized out>, dom=dom@entry=0x7ff0bc002f00, withResources=withResources@entry=true)
    at locking/domain_lock.c:142
#2  0x0000003e03cfc223 in virDomainLockProcessResume (plugin=<optimized out>, dom=dom@entry=0x7ff0bc002f00, state=0x0) at locking/domain_lock.c:199
#3  0x000000000048d400 in qemuProcessStartCPUs (driver=driver@entry=0x7ff0cc0094f0, vm=vm@entry=0x7ff0bc002f00, conn=conn@entry=0x7ff0cc1800e0, 
    reason=reason@entry=VIR_DOMAIN_RUNNING_BOOTED, asyncJob=asyncJob@entry=QEMU_ASYNC_JOB_NONE) at qemu/qemu_process.c:2656
#4  0x0000000000490a21 in qemuProcessStart (conn=conn@entry=0x7ff0cc1800e0, driver=driver@entry=0x7ff0cc0094f0, vm=vm@entry=0x7ff0bc002f00, 
    migrateFrom=migrateFrom@entry=0x0, cold_boot=cold_boot@entry=true, start_paused=start_paused@entry=false, autodestroy=autodestroy@entry=false, 
    stdin_fd=stdin_fd@entry=-1, stdin_path=stdin_path@entry=0x0, snapshot=snapshot@entry=0x0, vmop=vmop@entry=VIR_NETDEV_VPORT_PROFILE_OP_CREATE)
    at qemu/qemu_process.c:3718
#5  0x00000000004643e7 in qemudDomainCreate (conn=0x7ff0cc1800e0, xml=<optimized out>, flags=<optimized out>) at qemu/qemu_driver.c:1386
#6  0x0000003e03cd912f in virDomainCreateXML (conn=0x7ff0cc1800e0, 
    xmlDesc=0x7ff0bc001e40 "<?xml version=\"1.0\" encoding=\"utf-8\"?>\n<domain type=\"kvm\">\n\t<name>vm1</name>\n\t<uuid>f7c1b02f-b304-4e68-8565-3659b1214cba</uuid>\n\t<memory>1048576</memory>\n\t<currentMemory>1048576</currentMemory>\n\t<vcpu"..., flags=0) at libvirt.c:2008
#7  0x000000000043b711 in remoteDispatchDomainCreateXML (ret=0x7ff0bc00c8a0, args=0x7ff0bc00c540, rerr=0x7ff0d52bac30, client=0x9848a0, 
    server=<optimized out>, msg=<optimized out>) at remote_dispatch.h:958
#8  remoteDispatchDomainCreateXMLHelper (server=<optimized out>, client=0x9848a0, msg=<optimized out>, rerr=0x7ff0d52bac30, args=0x7ff0bc00c540, 
    ret=0x7ff0bc00c8a0) at remote_dispatch.h:938
#9  0x0000003e03d2d2b6 in virNetServerProgramDispatchCall (msg=0x9908a0, client=0x9848a0, server=0x9710e0, prog=0x972660) at rpc/virnetserverprogram.c:416
#10 virNetServerProgramDispatch (prog=0x972660, server=server@entry=0x9710e0, client=0x9848a0, msg=0x9908a0) at rpc/virnetserverprogram.c:289
#11 0x0000003e03d28e61 in virNetServerHandleJob (jobOpaque=<optimized out>, opaque=0x9710e0) at rpc/virnetserver.c:168
#12 0x0000003e03c66dee in virThreadPoolWorker (opaque=opaque@entry=0x97a420) at util/threadpool.c:144
#13 0x0000003e03c66879 in virThreadHelper (data=<optimized out>) at util/threads-pthread.c:161
#14 0x0000003e1f807d14 in start_thread (arg=0x7ff0d52bb700) at pthread_create.c:309
#15 0x0000003e1f4f199d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:115
Comment 6 Daniel Berrange 2012-06-11 10:33:03 EDT
Latest upstream libvirt code (to be in 0.9.13 release) has a new config param to explicitly skip any readonly/shared disks when doing locking

commit b8012ce9312f00947c5ca7250a7a96534c85835f
Author: David Weber <wb@munzinger.de>
Date:   Mon May 14 09:53:02 2012 +0000

    sanlock: fix locking for readonly devices
    
    Add ignore param for readonly and shared disk in sanlock
Comment 7 Mark Wu 2012-06-14 04:22:21 EDT
I hit this bug again when I create a VM from engine GUI. I am testing oVirt 3.1 beta. So I suggest we should resolve this problem before the 3.1 release.
Comment 8 Royce Lv 2012-06-14 06:23:06 EDT
this change depends on libvirt 0.9.13 release, which is not officially available.
Comment 9 Itamar Heim 2012-06-14 08:06:04 EDT
Dave - any eta on 0.9.13 release for fc17 to resolve this for ovirt users?
Comment 10 Dave Allan 2012-06-14 12:47:43 EDT
I don't think this necessarily requires all of 0.9.13.  Cole, is the fix backportable?
Comment 11 Cole Robinson 2012-06-14 13:20:45 EDT
Yeah I think so, but it's not a libvirt-maint candidate so we'll need to backport it separately.
Comment 12 Cole Robinson 2012-06-17 10:00:58 EDT
I'm just moving this bug to F17 libvirt, if that's wrong please feel free to move it back, clone it, etc.
Comment 13 Royce Lv 2012-06-17 22:39:10 EDT
Hope we can be informed here if there's any available libvirt candidate.'cause patch for vdsm (http://gerrit.ovirt.org/#/c/5057/)is highly depend on it.Thanks a lot!
Comment 14 Mark Wu 2012-06-17 23:01:39 EDT
Royce,
I believe you will get email notification on the status change of this bug since you're in the cc list. So don't worry about it.
Comment 15 Dave Allan 2012-06-18 14:47:11 EDT
Cole, any eta on a release with a fix for this BZ?
Comment 16 shu ming 2012-06-18 21:37:55 EDT
You can workaround the problem by commenting out the this line in /etc/libvirt/qemu.conf
lock_manager="sanlock"

And then reconfigure and restart libvirt service.  reconfigure vdsmd service didn't work because it will enable the line again automatically.
Comment 17 Cole Robinson 2012-06-19 11:12:10 EDT
I'll push an update in a minute. The version number will be 0.9.11.4-2, so that's what the vdsm spec should require (or higher)
Comment 18 Fedora Update System 2012-06-19 11:29:16 EDT
libvirt-0.9.11.4-2.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/libvirt-0.9.11.4-2.fc17
Comment 19 Jason Brooks 2012-06-19 12:30:14 EDT
I just installed libvirt-0.9.11.4-2.fc17 on my oVirt 3.1 setup, and I'm getting the same "Readonly leases are not supported" behavior as before.
Comment 20 Cole Robinson 2012-06-19 19:55:44 EDT
Jason, it doesn't look like http://gerrit.ovirt.org/#/c/5057/ is applied to ovirt yet
Comment 21 Jason Brooks 2012-06-19 20:00:16 EDT
After installing the updated libvirt, I noticed that "ignore_readonly_and_shared_disks = 1" was still commented out in my qemu-sanlock.conf file, so I uncommented it, and it didn't resolve the issue.
Comment 22 Jason Brooks 2012-06-19 20:33:17 EDT
I just built and installed the ovirt 3.1 packages from git, I have libvirt-0.9.11.4-2.fc17 installed, ignore_readonly_and_shared_disks = 1 in my qemu-sanlock.conf, and selinux in permissive mode. I was able to boot from an iso image. Not sure if the newer ovirt pkgs made the difference, or if it was cleaning and reinstalling the engine config. A lot of moving parts... I'm happy to test any specific things.
Comment 23 Mark Wu 2012-06-19 21:04:32 EDT
Jason,
The libvirt fix just introduced the option "ignore_readonly_and_shared_disks", but  didn't enable it by default. It will be enabled by the vdsm patch in http://gerrit.ovirt.org/#/c/5057/. Perhaps you forgot to restart libvirtd in the test you described in comment #21. So you could have another test when vdsm patch is merged. At that time you needn't manually enable "ignore_readonly_and_shared_disks".
Comment 24 Federico Simoncelli 2012-06-20 05:36:19 EDT
Daniel, why should we also set ignore_readonly_and_shared_disks=1 if we already declare:

auto_disk_leases=0
require_lease_for_disks=0
Comment 25 Daniel Berrange 2012-06-20 05:47:25 EDT
It is todo with the order in which things are processed. When the virLockManagerSanlockAddResource method is called, the first thing that is done is to check the flags & filter out those which are not supported (ie readonly + shared). Only later on does the auto_disk_lease flag get checked.The ignore_readonly_and_shared_disks parameter takes effect before all of those, avoiding the first check which raises the error.

What we really need todo is

 a. implement support for shared leases
 b. completely ignore readonly disks at all times

at that point the "ignore_readonly_and_shared_disks" parameter can go away, and more sensible default behaviour will apply.
Comment 26 Federico Simoncelli 2012-06-20 06:20:06 EDT
I understand what is in the code does but there's probably a semantic that I don't get here:

auto_disk_leases: whether libvirt should manage and automatically provide the leases for the disks
require_lease_for_disks: whether libvirt should enforce the presence of leases for the disks
ignore_readonly_and_shared_disks: do not consider readonly and shared disks for the locking mechanism

I see why this is useful for:

auto_disk_leases=(0|1)
require_lease_for_disks=1
ignore_readonly_and_shared_disks=(0|1)

But I don't see how this configuration can be valid:

auto_disk_leases=0
require_lease_for_disks=0
ignore_readonly_and_shared_disks=0

In this case libvirt is not managing the leases nor is checking their presence, why should it even care about the readonly/shared disks?

If auto_disk_leases/require_lease_for_disks is 0, then ignore_readonly_and_shared_disks should be assumed 1 (or not be considered at all).
Comment 27 Daniel Berrange 2012-06-20 10:29:02 EDT
> In this case libvirt is not managing the leases nor is checking their presence, > why should it even care about the readonly/shared disks?

As I mentioned above, the check for readonly/shared disks occurs before we get around to require_lease_for_disks/auto_disk_leases. So you need to have both set.
Comment 28 Fedora Update System 2012-06-20 15:23:37 EDT
Package libvirt-0.9.11.4-2.fc17:
* should fix your issue,
* was pushed to the Fedora 17 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing libvirt-0.9.11.4-2.fc17'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-9708/libvirt-0.9.11.4-2.fc17
then log in and leave karma (feedback).
Comment 29 Federico Simoncelli 2012-06-20 16:30:27 EDT
(In reply to comment #27)
> > In this case libvirt is not managing the leases nor is checking their presence, > why should it even care about the readonly/shared disks?
> 
> As I mentioned above, the check for readonly/shared disks occurs before we
> get around to require_lease_for_disks/auto_disk_leases. So you need to have
> both set.

Given that the position of the check should probably go deeper in the stack, eg: virLockManagerSanlockAddDisk/virLockManagerSanlockCreateLease (you'll need to pass the VIR_LOCK_MANAGER_RESOURCE_SHARED/VIR_LOCK_MANAGER_RESOURCE_READONLY parameter in the future anyway), it's also possible to fix the issue performing the early checks only if driver->autoDiskLease is true:

  if (type != VIR_LOCK_MANAGER_RESOURCE_TYPE_DISK ||
      driver->autoDiskLease) {
      if (flags & VIR_LOCK_MANAGER_RESOURCE_READONLY) {
          virLockError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
                       _("Readonly leases are not supported"));
          return -1;
      }
      if (flags & VIR_LOCK_MANAGER_RESOURCE_SHARED) {
          virLockError(VIR_ERR_CONFIG_UNSUPPORTED, "%s",
                       _("Shareable leases are not supported"));
          return -1;
      }
  }
Comment 31 Cole Robinson 2012-06-26 08:53:02 EDT
Dan posted a fix upstream:

https://www.redhat.com/archives/libvir-list/2012-June/msg00916.html

We will need to drop the backported sanlock.conf patch, since Dan's patch reverts it.
Comment 32 Peter Krempa 2012-06-28 05:22:54 EDT
The fix is now pushed upstream:

commit acbd4965c44c4dbc676dfe89aff970052e376073
Author: Daniel P. Berrange <berrange@redhat.com>
Date:   Thu Jun 21 15:34:46 2012 +0100

    Add support for shared sanlock leases
    
    A sanlock lease can be marked as shared (rather
    than exclusive)  using SANLK_RES_SHARED flag. This
    adds support for that flag and ensures that in auto
    disk mode, any shared disks use shared leases. This
    also makes any read-only disks be completely
    ignored.
    
    These changes remove the need for the option
    
      ignore_readonly_and_shared_disks
    
    so that is removed
    
    Signed-off-by: Daniel P. Berrange <berrange@redhat.com>
Comment 34 Cole Robinson 2012-06-28 09:06:04 EDT
Sorry Peter, I intended to do a build yesterday but didn't get around to it. I'll do one now.
Comment 35 Fedora Update System 2012-06-28 10:04:58 EDT
libvirt-0.9.11.4-3.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/libvirt-0.9.11.4-3.fc17
Comment 36 Fedora Update System 2012-06-30 18:04:31 EDT
libvirt-0.9.11.4-3.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.