Bug 850470 - sanlock masks useful error codes
sanlock masks useful error codes
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sanlock (Show other bugs)
6.3
Unspecified Unspecified
medium Severity medium
: rc
: ---
Assigned To: David Teigland
yeylon@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2012-08-21 11:55 EDT by Jiri Denemark
Modified: 2016-04-26 11:53 EDT (History)
9 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 820173
Environment:
Last Closed: 2015-09-30 10:06:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Jiri Denemark 2012-08-21 11:55:09 EDT
+++ This bug was initially created as a clone of Bug #820173 +++

Description of problem:
After config qemu.conf and qemu-sanlock.conf for use sanlocak , the libvirtd will die

Version-Release number of selected component (if applicable):
libvirt-lock-sanlock-0.9.10-16.el6.x86_64
libvirt-0.9.10-16.el6.x86_64
sanlock-1.8-2.el6.x86_64
qemu-kvm--0.12.1.2-2.285.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. enable sanlock in qemu.conf
# tail -1 /etc/libvirt/qemu.conf
lock_manager = "sanlock"

2. enable host_id, auto_disk_leases and disk_lease_dir in qemu-sanlock.conf
# tail -3 /etc/libvirt/qemu-sanlock.conf
host_id = 1
auto_disk_leases = 1
disk_lease_dir = "/var/lib/libvirt/sanlock"

3.
#libvirtd /*you will see something like below*/


2012-05-09 07:45:32.689+0000: 13547: info : libvirt version: 0.9.10, package: 16.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2012-05-02-04:50:10, hs20-bc2-5.build.redhat.com)
2012-05-09 07:45:32.689+0000: 13547: error : virLockManagerSanlockSetupLockspace:191 : Unable to query sector size /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device
2012-05-09 07:45:32.689+0000: 13547: error : qemudLoadDriverConfig:479 : Failed to load lock manager sanlock
2012-05-09 07:45:32.689+0000: 13547: error : qemudStartup:615 : Missing lock manager implementation
2012-05-09 07:45:32.689+0000: 13547: error : virStateInitialize:854 : Initialization of QEMU state driver failed
2012-05-09 07:45:32.756+0000: 13547: error : daemonRunStateInit:1179 : Driver state initialization failed

4.
#ls /var/lib/libvirt/sanlock/
total 0

5.
#service libvirtd restart
#service libvirtd status

Note : sometimes restart libvirtd will success , but it will die quickly.


Actual results:
Libvirtd can't work , and the file __LIBVIRT__DISKS__ does not be generated

Expected results:
the file should appear , and libvirtd work well.


Additional info:

Libvirt calls sanlock_align and gets ENODEV:

2012-05-09 07:45:32.689+0000: 13547: error :
virLockManagerSanlockSetupLockspace:191 : Unable to query sector size
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device

However, sanlock logs the following in /var/log/message:

May  9 15:58:38 intel-8500-4-2 sanlock[27830]: 175679 open error -13
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__

The reason is this incorrect piece of code in cmd_align from src/cmd.c:

	rv = open_disk(&sd);
	if (rv < 0) {
		result = -ENODEV;
		goto reply;
	}

where possibly useful value in rv is replaced with -ENODEV. Of course, the fix
is not just straightforward change to result = rv because open_disk sometime
reports bogus -1 instead of -errno.
Comment 1 David Teigland 2012-08-21 13:01:05 EDT
Let me get this straight, you want sanlock_align() to return EACCES in this case instead of ENODEV?  Will it help libvirt in any way?  I'm intentionally returning ENODEV for any problems related to the device, so I'll change this only if it will have a practical effect for the caller.
Comment 2 Jiri Denemark 2012-08-22 05:09:15 EDT
Not really, the best would be if it just returned the real error. What if it fails because libvirt passes incorrect file name and the real error would be EISDIR, ENOENT, ENOSPC or anything else? Masking them all behind ENODEV makes it harder to spot the real problem.
Comment 3 David Teigland 2012-09-20 17:19:41 EDT
kicking this down the road
Comment 5 David Teigland 2012-09-20 17:41:07 EDT
I'd think that the sanlock log message is just what we'd want for troubleshooting:

> sanlock[27830]: 175679 open error -13 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__

That's what people actually see; they don't see the value returned in the api.  The only reason to change the value in the api would be if libvirt uses it, but it doesn't.  (In fact, this auto lease feature is not even used/enabled/supported in the first place.)

The reason this is not trivial to change is that changing the error value at the lowest open_disk level means we'd have to audit and probably adjust a lot of code that this return value propagates through, some of which depends on the current value returned.
Comment 6 Jiri Denemark 2012-09-21 14:15:18 EDT
People actually see the value returned in the API because libvirt uses the API
and if it fails, the returned value is taken, translated into a string and
reported as a libvirt error. Writing the exact error into a log file make
sense for the sanlock daemon but sanlock client library that is supposed to be
called by other apps should be able to report back why its API failed so that
the calling app can present that to end users.

The fix might not be trivial but the request for it is still valid.
Comment 8 David Teigland 2012-09-21 15:58:41 EDT
This cannot effect supportability because we simply don't use or support the code in question:

- we do not support auto disk leases at all (which is what this bz is using)

- in 6.4 leases in libvirt are not used at all (auto or otherwise)

- eventually when vdsm starts passing leases through libvirt, even then libvirt will not be used to create the lockspaces (which is the api in question above)

So, while the request is valid, and I intend to address it at some point, there is no time in the foreseeable future when this code will be used by a customer, or effect support in any way.
Comment 10 RHEL Product and Program Management 2013-10-14 00:50:21 EDT
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
Comment 11 David Teigland 2015-09-30 10:06:48 EDT
This is not a bug.  If there's a specific issue that's a problem we will address that.

Note You need to log in before you can comment on or make changes to this bug.