This service will be undergoing maintenance at 00:00 UTC, 2017-10-23 It is expected to last about 30 minutes
Bug 968891 - Attaching NFS storage domain failed with 'Sanlock lockspace add failure'
Attaching NFS storage domain failed with 'Sanlock lockspace add failure'
Status: CLOSED NOTABUG
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.2.0
Unspecified Unspecified
unspecified Severity unspecified
: ---
: 3.2.0
Assigned To: Nobody's working on this, feel free to take it
storage
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2013-05-30 04:33 EDT by Katarzyna Jachim
Modified: 2016-02-10 15:21 EST (History)
10 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2013-05-30 08:32:02 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
logs (vdsm, sanlock, engine, server, /var/log/messages...) (608.86 KB, application/x-bzip2)
2013-05-30 04:33 EDT, Katarzyna Jachim
no flags Details

  None (edit)
Description Katarzyna Jachim 2013-05-30 04:33:02 EDT
Created attachment 754716 [details]
logs (vdsm, sanlock, engine, server, /var/log/messages...)

Description of problem:

Attaching an NFS storage domain failed with the following error in vdsm.log:

Thread-111::ERROR::2013-05-30 01:20:18,469::task::850::TaskManager.Task::(_setError) Task=`98ba799f-078e-418b-832c-cb03181f328e`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 41, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1098, in attachStorageDomain
    pool.attachSD(sdUUID)
  File "/usr/share/vdsm/storage/securable.py", line 68, in wrapper
    return f(self, *args, **kwargs)
  File "/usr/share/vdsm/storage/sp.py", line 932, in attachSD
    dom.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 445, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 188, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('795c691b-0aaf-4068-bb29-8ad969bc936a', SanlockException(-202, 'Sanlock lockspace add failure', 'Sanlock exception'))


Version-Release number of selected component (if applicable):
sf17.3
vdsm-4.10.2-22.0.el6ev.x86_64
libvirt-0.10.2-18.el6_4.5.x86_64

How reproducible:
happened once in automated tests: http://jenkins.qa.lab.tlv.redhat.com:8080/view/Storage/view/3.2/job/3.2-storage_nfs_options_sanity-nfs-rest/6/


Steps to Reproduce:
1. add, attach & activate master NFS storage domain 
2. add another NFS storage domain (NFS v3) 
3. try to attach the storage domain

I have no idea if the fact that it is a non-master domain or NFSv3 matters. It is definitely non-deterministic, re-run test passed.

Actual results:
Attach failed with sanlock exception

Expected results:
Storage domain should be attached

Additional info:
Comment 1 Dafna Ron 2013-05-30 08:32:02 EDT
its not a bug. 
sanlock is doing what its suppose to do. 
we have a race here. 
the create and connect of the Pool was still holding the lock on the master domain. 
when a second domain will be added sanlock cannot acquire lock and add domain will fail. 
closing as not a bug. 

below is the thread from sanlock log showing the race on the locks where this is the first domain: 28a3f1a9-7e8d-4b4b-8c69-1410f769aedc and this is the second domain: 795c691b-0aaf-4068-bb29-8ad969bc936a. 
 

2013-05-30 01:16:50+0300 22 [2592]: sanlock daemon started 2.6 host 104b6509-03c1-43c3-8d80-488158a1809e.puma19.scl
2013-05-30 01:18:42+0300 133 [2597]: s1 lockspace 28a3f1a9-7e8d-4b4b-8c69-1410f769aedc:250:/rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__36__484978/28a3f1a9-7e8d-4b4b-8c69-1410f769aedc/dom_md/ids:0
2013-05-30 01:19:03+0300 154 [2598]: s1:r1 resource 28a3f1a9-7e8d-4b4b-8c69-1410f769aedc:SDM:/rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__36__484978/28a3f1a9-7e8d-4b4b-8c69-1410f769aedc/dom_md/leases:1048576 for 3,12,3729
2013-05-30 01:19:10+0300 161 [2598]: s2 lockspace 28a3f1a9-7e8d-4b4b-8c69-1410f769aedc:1:/rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__36__484978/28a3f1a9-7e8d-4b4b-8c69-1410f769aedc/dom_md/ids:0
2013-05-30 01:19:31+0300 182 [2597]: s2:r2 resource 28a3f1a9-7e8d-4b4b-8c69-1410f769aedc:SDM:/rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__36__484978/28a3f1a9-7e8d-4b4b-8c69-1410f769aedc/dom_md/leases:1048576 for 3,12,3729
2013-05-30 01:19:43+0300 194 [2597]: s3 lockspace 795c691b-0aaf-4068-bb29-8ad969bc936a:1:/rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__37__925681/795c691b-0aaf-4068-bb29-8ad969bc936a/dom_md/ids:0
2013-05-30 01:20:13+0300 224 [6557]: 795c691b aio timeout 0 0x7f33e80008c0:0x7f33e80008d0:0x7f33e8101000 ioto 10 to_count 1
2013-05-30 01:20:13+0300 224 [6557]: read_sectors delta_leader offset 0 rv -202 /rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__37__925681/795c691b-0aaf-4068-bb29-8ad969bc936a/dom_md/ids
2013-05-30 01:20:18+0300 229 [6557]: 795c691b aio collect 0x7f33e80008c0:0x7f33e80008d0:0x7f33e8101000 result 512:0 close free
2013-05-30 01:20:18+0300 229 [2597]: s3 add_lockspace fail result -202
2013-05-30 01:20:43+0300 254 [2598]: s4 lockspace 28a3f1a9-7e8d-4b4b-8c69-1410f769aedc:1:/rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__36__484978/28a3f1a9-7e8d-4b4b-8c69-1410f769aedc/dom_md/ids:0
2013-05-30 01:21:05+0300 276 [2597]: s4:r3 resource 28a3f1a9-7e8d-4b4b-8c69-1410f769aedc:SDM:/rhev/data-center/mnt/10.35.64.106:_fastpass_jenkins-vm-01__nfs__2013__05__30__1__12__36__484978/28a3f1a9-7e8d-4b4b-8c69-1410f769aedc/dom_md/leases:1048576 for 3,12,3729

Note You need to log in before you can comment on or make changes to this bug.