Bug 977926 - VDSM: Failed in CreateStoragePoolVDS when sanlock service is already running
Summary: VDSM: Failed in CreateStoragePoolVDS when sanlock service is already running
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 3.3.0
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
: 3.3.0
Assignee: Federico Simoncelli
QA Contact: Leonid Natapov
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-25 15:19 UTC by Eyal Edri
Modified: 2016-02-10 19:58 UTC (History)
14 users (show)

Fixed In Version: is6
Doc Type: Bug Fix
Doc Text:
Previously, if sanlock was already running before installing vdsm, the daemon kept running with the same privileges it had before the service started. This prevented some operations on File Domains to be executed (sanlock lockspace add failure). Now, when vdsm is installed, sanlock is restarted so the daemon can pick up the required privileges.
Clone Of:
Environment:
Last Closed: 2014-01-21 16:09:34 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The Error Page! (146.21 KB, image/png)
2013-08-27 10:04 UTC, higkoo
no flags Details
engine.log on server console (160.29 KB, application/zip)
2013-08-27 12:15 UTC, higkoo
no flags Details
vdsm.log on node (126.14 KB, application/zip)
2013-08-27 12:17 UTC, higkoo
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2014:0040 0 normal SHIPPED_LIVE vdsm bug fix and enhancement update 2014-01-21 20:26:21 UTC
oVirt gerrit 16742 0 None MERGED init: restart sanlock groups are missing 2020-11-21 14:40:57 UTC

Description Eyal Edri 2013-06-25 15:19:56 UTC
Description of problem:

when adding a host that already runs sanlock (sanlock service is running), createstoragepool fails on:

ENGINE ERROR:

2013-06-25 16:54:40,165 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (ajp-/127.0.0.1:8702-5) [10392123] Failed in CreateStoragePoolVDS method
2013-06-25 16:54:40,166 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (ajp-/127.0.0.1:8702-5) [10392123] Error code AcquireHostIdFailure and error message VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('db3b7409-97cc-4ce0-9bf2-1d7cd207afba', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))
2013-06-25 16:54:40,166 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (ajp-/127.0.0.1:8702-5) [10392123] Command org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand return value 


vdsm:

Thread-48::ERROR::2013-06-25 16:54:40,179::task::850::TaskManager.Task::(_setError) Task=`a1d42150-0f88-45c5-bb46-4a6e812e9e03`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 41, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 909, in createStoragePool
    masterVersion, leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 568, in create
    self._acquireTemporaryClusterLock(msdUUID, leaseParams)
  File "/usr/share/vdsm/storage/sp.py", line 510, in _acquireTemporaryClusterLock
    msd.acquireHostId(self.id)
  File "/usr/share/vdsm/storage/sd.py", line 445, in acquireHostId
    self._clusterLock.acquireHostId(hostId, async)
  File "/usr/share/vdsm/storage/clusterlock.py", line 188, in acquireHostId
    raise se.AcquireHostIdFailure(self._sdUUID, e)
AcquireHostIdFailure: Cannot acquire host id: ('db3b7409-97cc-4ce0-9bf2-1d7cd207afba', SanlockException(19, 'Sanlock lockspace add failure', 'No such device'))

sanlock:

2013-06-25 16:31:40+0300 44 [5535]: sanlock daemon started 2.6 host e976d34b-f81a-4ac6-8fdd-1b93a63ea8ef.cinteg07.c
2013-06-25 16:54:39+0300 1423 [5542]: s1 lockspace db3b7409-97cc-4ce0-9bf2-1d7cd207afba:250:/rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm48__nfs__2013__06__25__16__53__26__467293/db3b7409-97cc-4ce0-9bf2-1d7cd207afba/dom_md/ids:0
2013-06-25 16:54:39+0300 1423 [11080]: open error -13 /rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm48__nfs__2013__06__25__16__53__26__467293/db3b7409-97cc-4ce0-9bf2-1d7cd207afba/dom_md/ids
2013-06-25 16:54:39+0300 1423 [11080]: s1 open_disk /rhev/data-center/mnt/10.35.148.12:_vol_vfilers_nfs01_jenkins-automation-rpm-vm48__nfs__2013__06__25__16__53__26__467293/db3b7409-97cc-4ce0-9bf2-1d7cd207afba/dom_md/ids error -13
2013-06-25 16:54:40+0300 1424 [5542]: s1 add_lockspace fail result -19
2013-06-25 17:42:34+0300 4299 [5541]: s2 lockspace 84cae5c7-408b-4a10-9ae9-8dcfc826286c:250:/dev/84cae5c7-408b-4a10-9ae9-8dcfc826286c/ids:0
2013-06-25 17:42:34+0300 4299 [13061]: open error -13 /dev/84cae5c7-408b-4a10-9ae9-8dcfc826286c/ids
2013-06-25 17:42:34+0300 4299 [13061]: s2 open_disk /dev/84cae5c7-408b-4a10-9ae9-8dcfc826286c/ids error -13
2013-06-25 17:42:35+0300 4300 [5541]: s2 add_lockspace fail result -19


Version-Release number of selected component (if applicable):


How reproducible:
always


Steps to Reproduce:
1.install latest engine 3.3
2.start sanlock service
3.add host with vdsm running latest 3.2 (vdsm-4.10.2-23.0.el6ev.x86_64)

Actual results:


Expected results:


Additional info:

Comment 4 Leonid Natapov 2013-07-31 13:58:30 UTC
is7. added 3.3.2 host with vdsm-4.10.2-24.0.el6ev.x86_64 to is7 setup. storage pool was successfully created.

Comment 5 higkoo 2013-08-27 06:11:22 UTC
I still get this error:
oVirt Engine - 3.3.0-0.3.beta1.el6
RHEL - 6 - 4.el6.10
kernel - 2.6.32 - 358.14.2.el6.x86_64
KVM - 0.12.1.2 - 2.355.el6
libvirt - 1.1.1-1.el6
vdsm - 4.12.0-0.1.rc3.el6
SPICE - 0.12.0 - 12.el6
CPU - Intel SandyBridge Family - Intel(R) Xeon(R) CPU E5-2620 0 @ 2.00GHz

Comment 6 higkoo 2013-08-27 10:04:47 UTC
Created attachment 790874 [details]
The Error Page!

there are general info on the picture.
error message still:
2013-08-27 18:01:22,929 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (ajp--127.0.0.1-8702-2) Failed in CreateStoragePoolVDS method
2013-08-27 18:01:22,929 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (ajp--127.0.0.1-8702-2) Error code AcquireHostIdFailure and error message VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('551319ca-ea70-4cd3-a1df-3bd2bac9c7fa', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))
2013-08-27 18:01:22,932 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStoragePoolVDSCommand] (ajp--127.0.0.1-8702-2) Command CreateStoragePoolVDS execution failed. Exception: VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('551319ca-ea70-4cd3-a1df-3bd2bac9c7fa', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))
2013-08-27 18:01:22,933 ERROR [org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand] (ajp--127.0.0.1-8702-2) Command org.ovirt.engine.core.bll.storage.AddStoragePoolWithStoragesCommand throw Vdc Bll exception. With error message VdcBLLException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to CreateStoragePoolVDS, error = Cannot acquire host id: ('551319ca-ea70-4cd3-a1df-3bd2bac9c7fa', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) (Failed with VDSM error AcquireHostIdFailure and code 661)

Comment 7 Allon Mureinik 2013-08-27 11:40:59 UTC
higkoo, please attach VDSM + engine logs.

Comment 8 higkoo 2013-08-27 12:15:50 UTC
Created attachment 790941 [details]
engine.log on server console

last 10000 line of engine.log on server.

Comment 9 higkoo 2013-08-27 12:17:31 UTC
Created attachment 790943 [details]
vdsm.log on node

Last 10000 of client vdsm.log. thanks!

Comment 10 anil 2013-09-30 07:02:49 UTC
I restarted sanlock on both ovirt nodes in cluster & attached storgedomain again & it worked fine 
[root@node1-3-3 ~]# service sanlock status
sanlock (pid 11908 11906) is running...
[root@node1-3-3 ~]# service sanlock restart
Sending stop signal sanlock (11906):                       [  OK  ]
Waiting for sanlock (11906) to stop:                       [  OK  ]
Starting sanlock:                                          [  OK  ]

Comment 11 Charlie 2013-11-28 00:34:53 UTC
This bug is currently attached to errata RHBA-2013:15291. If this change is not to be documented in the text for this errata please either remove it from the errata, set the requires_doc_text flag to 
minus (-), or leave a "Doc Text" value of "--no tech note required" if you do not have permission to alter the flag.

Otherwise to aid in the development of relevant and accurate release documentation, please fill out the "Doc Text" field above with these four (4) pieces of information:

* Cause: What actions or circumstances cause this bug to present.
* Consequence: What happens when the bug presents.
* Fix: What was done to fix the bug.
* Result: What now happens when the actions or circumstances above occur. (NB: this is not the same as 'the bug doesn't present anymore')

Once filled out, please set the "Doc Type" field to the appropriate value for the type of change made and submit your edits to the bug.

For further details on the Cause, Consequence, Fix, Result format please refer to:

https://bugzilla.redhat.com/page.cgi?id=fields.html#cf_release_notes 

Thanks in advance.

Comment 12 errata-xmlrpc 2014-01-21 16:09:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2014-0040.html


Note You need to log in before you can comment on or make changes to this bug.