Bug 723967

Summary: virsh command will be hung when enable lock_manager in qemu.conf
Product: Red Hat Enterprise Linux 6 Reporter: Alex Jia <ajia>
Component: libvirtAssignee: Daniel Berrangé <berrange>
Status: CLOSED NOTABUG QA Contact: Virtualization Bugs <virt-bugs>
Severity: high Docs Contact:
Priority: high    
Version: 6.2CC: dallan, rwu
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-10-17 07:55:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 572343, 578121    

Description Alex Jia 2011-07-21 16:22:08 UTC
Description of problem:
virsh command or other libvirt clients such as python client will be hung when enable lock_manager=fcntl or lock_manager=sanlock in qemu.conf.

Version-Release number of selected component (if applicable):

# uname -r
2.6.32-160.el6.x86_64

# rpm -qa|grep libvirt
libvirt-lock-sanlock-0.9.3-7.el6.x86_64
libvirt-devel-0.9.3-7.el6.x86_64
libvirt-debuginfo-0.9.3-7.el6.x86_64
libvirt-client-0.9.3-7.el6.x86_64
libvirt-0.9.3-7.el6.x86_64
libvirt-python-0.9.3-7.el6.x86_64

# rpm -q qemu-kvm
qemu-kvm-0.12.1.2-2.169.el6.x86_64

How reproducible:
always

Steps to Reproduce:
1. uncomment lock_manager = "fcntl" in  /etc/libvirt/qemu.conf
2. restart libvirtd service
3. run virsh list

or change lock_manager = "sanlock" in qemu.conf and repeat 1-3. 
  
Actual results:
virsh command will be hung, if using interactive python and import libvirt module then con = libvirt.open(''), python client will also be hung.

Expected results:
virsh command works well

Additional info:
It may be libvirt is missing 'return -1' when load lock manager failed in ./src/qemu/qemu_conf.c.
448         if (!(driver->lockManager =
449               virLockManagerPluginNew(p->str, lockConf, 0)))
450             VIR_ERROR(_("Failed to load lock manager %s"), p->str);

As danpb said, this is also double-checked elsewhere
if (qemudLoadDriverConfig(qemu_driver, driverConf) < 0) {
         goto error;
     }
     VIR_FREE(driverConf);
     /* We should always at least have the 'nop' manager, so
      * NULLs here are a fatal error
      */
     if (!qemu_driver->lockManager) {
         VIR_ERROR(_("Missing lock manager implementation"));
         goto error;
     }

libvirtd log:

1. fcntl lock manager:

18:43:23.554: 4829: error : virLockManagerPluginNew:147 : Plugin /usr/lib64/libvirt/lock-driver/fcntl.so not accessible: No such file or directory
18:43:23.554: 4829: error : qemudLoadDriverConfig:450 : Failed to load lock manager fcntl
18:43:23.554: 4829: error : qemudStartup:535 : Missing lock manager implementation
18:43:23.554: 4829: error : virStateInitialize:846 : Initialization of QEMU state driver failed
18:43:23.576: 4829: error : daemonRunStateInit:1172 : Driver state initialization failed

Note, libvirt haven't implemented fcntl lock now.

2. sanlock manager:

23:51:54.432: 8015: error : virLockManagerSanlockSetupLockspace:242 : Unable to add lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation not permitted
23:51:54.432: 8015: error : qemudLoadDriverConfig:450 : Failed to load lock manager sanlock
23:51:54.432: 8015: error : qemudStartup:535 : Missing lock manager implementation
23:51:54.432: 8015: error : virStateInitialize:846 : Initialization of QEMU state driver failed
23:51:54.457: 8015: error : daemonRunStateInit:1172 : Driver state initialization failed

Note: libvirt hasn't created /var/lib/libvirt/sanlock directory, as danpb suggested, I manually create this directory then restart libvirtd service, I can see __LIBVIRT__DISKS__ under the /var/lib/libvirt/sanlock directory, however, I run 'virsh list' again, the result is the same as above, 'virsh list' will be hung.

Comment 2 Alex Jia 2011-08-12 10:50:44 UTC
In fact, there are 3 issues with sanlock lock manager for this bug:
1. libvirt hasn't created /var/lib/libvirt/sanlock directory

Now, it is okay for me, I can see  _LIBVIRT__DISKS__ under the above directory:
# ll -Z /var/lib/libvirt/sanlock
-rw-------. root root unconfined_u:object_r:virt_var_lib_t:s0 __LIBVIRT__DISKS__


2. enable lock manager in qemu.conf then restart libvirtd service, this will hang virsh command.

Now, it is okay for me, libvirtd can shut down when initialization fails, and I can catch 'SIGTERM' signal by gdb:
......02:12:30.404: 28854: error : daemonRunStateInit:1162 : Driver state initialization failed

Program received signal SIGTERM, Terminated.
......

3. "Unable to add lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Operation not permitted"

This issue hasn't been resolved, and this will lead to libvirtd dead, please see "How to reproduce?" section:
# service libvirtd status
libvirtd dead but subsys locked


How to reproduce?
1. enable sanlock in qemu.conf
2. enable auto_disk_leases, host_id and disk_lease_dir in /etc/libvirt/qemu-sanlock.conf
3. restart libvirtd service
4. service libvirtd status


BTW, I haven't started wdmd and sanlock daemon, and hasn't done any configuration for sharing storage and guest, only simply change qemu.conf and qemu-sanlock.conf configuration then restart libvirtd service.


Alex

Comment 3 Alex Jia 2011-08-12 11:02:26 UTC
In addition, the 'sanlock_add_lockspace' function will be failed, IMHO it should has relationship with above issue 3.

src/locking/lock_driver_sanlock.c:

144 static int virLockManagerSanlockSetupLockspace(void)
......
 233     if ((rv = sanlock_add_lockspace(&ls, 0)) < 0) {
 234         if (-rv != EEXIST) {
 235             if (rv <= -200)
 236                 virLockError(VIR_ERR_INTERNAL_ERROR,
 237                              _("Unable to add lockspace %s: error %d"),
 238                              path, rv);
 239             else
 240                 virReportSystemError(-rv,
 241                                      _("Unable to add lockspace %s"),
 242                                      path);
 243             return -1;
.....


Alex

Comment 5 Alex Jia 2011-09-28 10:07:52 UTC
This issue still exists on rhel6.2 beta with libvirt-0.9.4-12.el6.x86_64.

Comment 6 Daniel Berrangé 2011-10-14 11:32:03 UTC
I don't see what the actual bug is here. If you have enabled sanlock in libvirtd, and not started sanlock, then the expected behaviour is to see something like the following log messages

2011-09-14 12:30:09.850: 11987: error : virLockManagerSanlockSetupLockspace:241 : Unable to add lockspace /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: Connection refused
2011-09-14 12:30:09.872: 11987: error : qemudLoadDriverConfig:457 : Failed to load lock manager sanlock
2011-09-14 12:30:09.872: 11987: error : qemudStartup:570 : Missing lock manager implementation
2011-09-14 12:30:09.873: 11987: error : virStateInitialize:862 : Initialization of QEMU state driver failed
2011-09-14 12:30:10.352: 11987: error : daemonRunStateInit:1149 : Driver state initialization failed

and then libvirtd will exit.

Comment 8 Alex Jia 2011-10-17 03:20:22 UTC
(In reply to comment #6)
> 2011-09-14 12:30:09.873: 11987: error : virStateInitialize:862 : Initialization
> of QEMU state driver failed
> 2011-09-14 12:30:10.352: 11987: error : daemonRunStateInit:1149 : Driver state
> initialization failed
Hi Daniel,
Yeah, I can see the above errors when I configured qemu.conf and qemu-sanlock.conf the restart libvirtd service with sanlock daemon stop.
> 
> and then libvirtd will exit.

And when I run client cmdline such as virsh list, libvirtd has exited:
# virsh list
error: Failed to reconnect to the hypervisor
error: no valid connection
error: Failed to connect socket to '/var/run/libvirt/libvirt-sock': No such file or directory

# service libvirtd status
libvirtd dead but subsys locked

Notes: so is libvirtd dead a expected behaviour?

This testing is based on libvirt-0.9.4-14.el6.x86_64.

Thanks,
Alex

Comment 9 Daniel Berrangé 2011-10-17 07:55:27 UTC
> Notes: so is libvirtd dead a expected behaviour?

Yes you have told libvirt to use sanlock for protecting disks & sanlock could not be configured. So the only safe thing for libvirtd to do is to refuse to start.

Comment 10 Alex Jia 2011-10-17 10:12:43 UTC
Hi Daniel,
IMHO, this is a bug, and you have applied patch on Comment 1, except this, I following some test steps from  http://libvirt.org/locking.html, which says we need the following steps:

# chkconfig sanlock on
# service sanlock start

If not, we should document these stuff again, otherwise, users probably also make mistake.

In addition, please give a fixed version for this bug.

Thanks,
Alex