RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 850470 - sanlock masks useful error codes
Summary: sanlock masks useful error codes
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: sanlock
Version: 6.3
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: rc
: ---
Assignee: David Teigland
QA Contact: yeylon@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-08-21 15:55 UTC by Jiri Denemark
Modified: 2016-04-26 15:53 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of: 820173
Environment:
Last Closed: 2015-09-30 14:06:48 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Jiri Denemark 2012-08-21 15:55:09 UTC
+++ This bug was initially created as a clone of Bug #820173 +++

Description of problem:
After config qemu.conf and qemu-sanlock.conf for use sanlocak , the libvirtd will die

Version-Release number of selected component (if applicable):
libvirt-lock-sanlock-0.9.10-16.el6.x86_64
libvirt-0.9.10-16.el6.x86_64
sanlock-1.8-2.el6.x86_64
qemu-kvm--0.12.1.2-2.285.el6.x86_64


How reproducible:
100%

Steps to Reproduce:
1. enable sanlock in qemu.conf
# tail -1 /etc/libvirt/qemu.conf
lock_manager = "sanlock"

2. enable host_id, auto_disk_leases and disk_lease_dir in qemu-sanlock.conf
# tail -3 /etc/libvirt/qemu-sanlock.conf
host_id = 1
auto_disk_leases = 1
disk_lease_dir = "/var/lib/libvirt/sanlock"

3.
#libvirtd /*you will see something like below*/


2012-05-09 07:45:32.689+0000: 13547: info : libvirt version: 0.9.10, package: 16.el6 (Red Hat, Inc. <http://bugzilla.redhat.com/bugzilla>, 2012-05-02-04:50:10, hs20-bc2-5.build.redhat.com)
2012-05-09 07:45:32.689+0000: 13547: error : virLockManagerSanlockSetupLockspace:191 : Unable to query sector size /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device
2012-05-09 07:45:32.689+0000: 13547: error : qemudLoadDriverConfig:479 : Failed to load lock manager sanlock
2012-05-09 07:45:32.689+0000: 13547: error : qemudStartup:615 : Missing lock manager implementation
2012-05-09 07:45:32.689+0000: 13547: error : virStateInitialize:854 : Initialization of QEMU state driver failed
2012-05-09 07:45:32.756+0000: 13547: error : daemonRunStateInit:1179 : Driver state initialization failed

4.
#ls /var/lib/libvirt/sanlock/
total 0

5.
#service libvirtd restart
#service libvirtd status

Note : sometimes restart libvirtd will success , but it will die quickly.


Actual results:
Libvirtd can't work , and the file __LIBVIRT__DISKS__ does not be generated

Expected results:
the file should appear , and libvirtd work well.


Additional info:

Libvirt calls sanlock_align and gets ENODEV:

2012-05-09 07:45:32.689+0000: 13547: error :
virLockManagerSanlockSetupLockspace:191 : Unable to query sector size
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__: No such device

However, sanlock logs the following in /var/log/message:

May  9 15:58:38 intel-8500-4-2 sanlock[27830]: 175679 open error -13
/var/lib/libvirt/sanlock/__LIBVIRT__DISKS__

The reason is this incorrect piece of code in cmd_align from src/cmd.c:

	rv = open_disk(&sd);
	if (rv < 0) {
		result = -ENODEV;
		goto reply;
	}

where possibly useful value in rv is replaced with -ENODEV. Of course, the fix
is not just straightforward change to result = rv because open_disk sometime
reports bogus -1 instead of -errno.

Comment 1 David Teigland 2012-08-21 17:01:05 UTC
Let me get this straight, you want sanlock_align() to return EACCES in this case instead of ENODEV?  Will it help libvirt in any way?  I'm intentionally returning ENODEV for any problems related to the device, so I'll change this only if it will have a practical effect for the caller.

Comment 2 Jiri Denemark 2012-08-22 09:09:15 UTC
Not really, the best would be if it just returned the real error. What if it fails because libvirt passes incorrect file name and the real error would be EISDIR, ENOENT, ENOSPC or anything else? Masking them all behind ENODEV makes it harder to spot the real problem.

Comment 3 David Teigland 2012-09-20 21:19:41 UTC
kicking this down the road

Comment 5 David Teigland 2012-09-20 21:41:07 UTC
I'd think that the sanlock log message is just what we'd want for troubleshooting:

> sanlock[27830]: 175679 open error -13 /var/lib/libvirt/sanlock/__LIBVIRT__DISKS__

That's what people actually see; they don't see the value returned in the api.  The only reason to change the value in the api would be if libvirt uses it, but it doesn't.  (In fact, this auto lease feature is not even used/enabled/supported in the first place.)

The reason this is not trivial to change is that changing the error value at the lowest open_disk level means we'd have to audit and probably adjust a lot of code that this return value propagates through, some of which depends on the current value returned.

Comment 6 Jiri Denemark 2012-09-21 18:15:18 UTC
People actually see the value returned in the API because libvirt uses the API
and if it fails, the returned value is taken, translated into a string and
reported as a libvirt error. Writing the exact error into a log file make
sense for the sanlock daemon but sanlock client library that is supposed to be
called by other apps should be able to report back why its API failed so that
the calling app can present that to end users.

The fix might not be trivial but the request for it is still valid.

Comment 8 David Teigland 2012-09-21 19:58:41 UTC
This cannot effect supportability because we simply don't use or support the code in question:

- we do not support auto disk leases at all (which is what this bz is using)

- in 6.4 leases in libvirt are not used at all (auto or otherwise)

- eventually when vdsm starts passing leases through libvirt, even then libvirt will not be used to create the lockspaces (which is the api in question above)

So, while the request is valid, and I intend to address it at some point, there is no time in the foreseeable future when this code will be used by a customer, or effect support in any way.

Comment 10 RHEL Program Management 2013-10-14 04:50:21 UTC
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.

Comment 11 David Teigland 2015-09-30 14:06:48 UTC
This is not a bug.  If there's a specific issue that's a problem we will address that.


Note You need to log in before you can comment on or make changes to this bug.