RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1409511 - Better error on failure to acquire lease on run VM
Summary: Better error on failure to acquire lease on run VM
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: libvirt
Version: 7.3
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: rc
: ---
Assignee: Jiri Denemark
QA Contact: Han Han
URL:
Whiteboard:
: 1443153 (view as bug list)
Depends On: 1443140
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-02 10:29 UTC by Arik
Modified: 2017-09-26 06:44 UTC (History)
8 users (show)

Fixed In Version: libvirt-3.2.0-4.el7
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-08-01 17:21:45 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1846 0 normal SHIPPED_LIVE libvirt bug fix and enhancement update 2017-08-01 18:02:50 UTC

Description Arik 2017-01-02 10:29:40 UTC
Description of problem:
In RHV we use lease devices in order to restart VMs automatically when host becomes non-responsive. When the host becomes non-responsive, after some initial delay, we try to run VMs that are configured with a lease on other hosts. However, the error that is returned when the VM fails to start because its lease could not be acquired is not cleared, we need a better error so users will understand that the VM cannot be started specifically because of the lease.

Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
1. Run a VM with a lease device
2. Run the same VM on another host
3.

Actual results:
The error that is returned is:
resource busy: Failed to acquire lock: error -243.

Expected results:
The error should indicate that the resource is a lease.

Additional info:

Comment 2 Nir Soffer 2017-01-03 18:04:53 UTC
The -243 string seems to come from sanlock:

src/sanlock_rv.h:#define SANLK_ACQUIRE_IDLIVE   -243

I think what is missing is:
- make it clear that the issue is failing to acquire a lease
- provide the lease id (target path:offset, or lockspace:key) so a management
  application can report back a good error.
- provide a reason why the acquire failed - here sanlock provided an error code
  that can be used to provide a pretty message.

David, can sanlock provide a good error message with the error code in this case?
Comparing the errno and strerror(), you would expect a system to provide both 
an error code and a clear message for each error code.

Comment 3 Nir Soffer 2017-01-03 18:10:43 UTC
Can Vdsm assume that getting VIR_ERR_RESOURCE_BUSY error for a vm with a lease is
always a failure to acquire the lease?  or there are other unrelated failures
reported using the same error code?

Comment 4 David Teigland 2017-01-03 18:51:09 UTC
I'll add sanlock_strerror() to the lib.

Comment 5 David Teigland 2017-01-03 18:52:36 UTC
from comment 3

Comment 6 Jaroslav Suchanek 2017-01-06 14:15:27 UTC
As for the comment 3, I would say yes, it should be reliable. At least it was
subject of bug 1165119. Jirka Denemark can comment.

Comment 7 Jiri Denemark 2017-01-09 13:38:23 UTC
(In reply to Nir Soffer from comment #3)
> Can Vdsm assume that getting VIR_ERR_RESOURCE_BUSY error for a vm with a
> lease is always a failure to acquire the lease?

Ideally you should check the error domain too. That is, if you see a VIR_ERR_RESOURCE_BUSY error and its domain is either VIR_FROM_LOCKING or VIR_FROM_LOCKSPACE, libvirt was unable to acquire the lease.

Comment 9 Jiri Denemark 2017-04-18 15:26:33 UTC
*** Bug 1443153 has been marked as a duplicate of this bug. ***

Comment 11 Jiri Denemark 2017-04-19 08:33:52 UTC
Patch sent upstream for review: https://www.redhat.com/archives/libvir-list/2017-April/msg00842.html

Comment 12 Jiri Denemark 2017-04-27 09:53:33 UTC
Support for sanlock_strerror is implemented upstream by

commit 23377c539b72a7fc4e2749a068711fe1f626998d
Refs: v3.2.0-271-g23377c539
Author:     Jiri Denemark <jdenemar>
AuthorDate: Fri Mar 31 21:42:22 2017 +0200
Commit:     Jiri Denemark <jdenemar>
CommitDate: Thu Apr 27 11:44:11 2017 +0200

    locking: Add support for sanlock_strerror

    The recently added sanlock_strerror function can be used to translate
    sanlock's numeric errors into human readable strings.

    https://bugzilla.redhat.com/show_bug.cgi?id=1409511

    Signed-off-by: Jiri Denemark <jdenemar>

Comment 16 Han Han 2017-06-15 00:31:30 UTC
Since triggering all error types is complex and the codes structures are similar I will use one error to verify the bug. And I will try to trigger more errors later.
Verify it on libvirt-3.2.0-9.el7.x86_64 sanlock-3.5.0-1.el7.x86_64
1. Set following configurations:
In /etc/libvirt/qemu.conf:
lock_manager = "sanlock"
In /etc/libvirt/qemu-sanlock.conf:
user = "sanlock"
group = "sanlock"
host_id = 1
auto_disk_leases = 0
disk_lease_dir = "/var/lib/libvirt/sanlock"
require_lease_for_disks = 1

2. Run this script to manually create lockspace and resource files.
#!/bin/bash -x
DOM=avocado-vt-vm1
lockspace_name=libvirt-sanlock                                                  
lockspce_resource_path=/var/lib/libvirt/sanlock/libvirt-sanlock                 
resource_name=test-disk-resource-lock                                           
resource_offset=1048576 
truncate -s 2M $lockspce_resource_path                                          
chown sanlock:sanlock $lockspce_resource_path                                   
sanlock direct init -s $lockspace_name:0:$lockspce_resource_path:0              
sanlock add_lockspace -s $lockspace_name:1:$lockspce_resource_path:0            
sanlock direct init -r $lockspace_name:$resource_name:$lockspce_resource_path:$resource_offset
restorecon -R -v /var/lib/libvirt/sanlock

3. Use fault key and add lease xml to VM:
<lease>
      <lockspace>libvirt-sanlock</lockspace>
      <key>fault</key>
      <target path='/var/lib/libvirt/sanlock/libvirt-sanlock' offset='1048576'/>
    </lease>

4. Try to start VM
In RHEL7.3, I got:
# virsh start avocado-vt-vm1 
error: Failed to start domain avocado-vt-vm1
error: resource busy: Failed to acquire lock: error -227

In RHEL7.4, I got:
# virsh start V           
error: Failed to start domain V
error: resource busy: Failed to acquire lock: Lease resource name is incorrect

Comment 17 errata-xmlrpc 2017-08-01 17:21:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846

Comment 18 errata-xmlrpc 2017-08-02 00:01:15 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1846


Note You need to log in before you can comment on or make changes to this bug.