this was also reproduced on vdsm-4.9.6-9.el6.x86_64 on build si3
Patch already exists and tested.
+++ This bug was initially created as a clone of Bug #818577 +++
Created attachment 581846[details]
log
Description of problem:
checking a patch I put master domain in maintenance I got a wrong master domain ERROR on spm.
looking at logs we see the below:
Exception: Cluster lock not locked for domain `79d5c099-dac2-44a6-9fb5-9a2f70da5388`, cannot release
Version-Release number of selected component (if applicable):
How reproducible:
100%
Steps to Reproduce:
1. migrate master domain to second domain
2.
3.
Actual results:
we get wrong master domain
Expected results:
we should not get wrong master domain
Additional info:
Thread-31::ERROR::2012-05-03 14:44:37,147::task::853::TaskManager.Task::(_setError) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::Unexpected error
Traceback (most recent call last):
File "/usr/share/vdsm/storage/task.py", line 861, in _run
return fn(*args, **kargs)
File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
res = f(*args, **kwargs)
File "/usr/share/vdsm/storage/hsm.py", line 1471, in reconstructMaster
return pool.reconstructMaster(poolName, masterDom, domDict, masterVersion, safeLease)
File "/usr/share/vdsm/storage/sp.py", line 772, in reconstructMaster
self._releaseTemporaryClusterLock(msdUUID)
File "/usr/share/vdsm/storage/sp.py", line 523, in _releaseTemporaryClusterLock
msd.releaseClusterLock()
File "/usr/share/vdsm/storage/sd.py", line 418, in releaseClusterLock
self._clusterLock.release()
File "/usr/share/vdsm/storage/safelease.py", line 111, in release
raise Exception("Cluster lock not locked for domain `%s`, cannot release" % self._sdUUID)
Exception: Cluster lock not locked for domain `79d5c099-dac2-44a6-9fb5-9a2f70da5388`, cannot release
Thread-31::DEBUG::2012-05-03 14:44:37,148::task::872::TaskManager.Task::(_run) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::Task._run: 7c9afe3e-b3fe-4f61-9d22-374a6977d3ba ('63dbb4ef-10e8-43ce-98bd-acfb46fccfbf', 'rhevm-iscsi', '79d5c099-dac2-44a6-9fb5-9a2f70da5388', {'10641fc4-3ad2-4538-b25b-6aa6e5c431b7': 'active', '79d5c099-dac2-44a6-9fb5-9a2f70da5388': 'active'}, 5, None, 5, 60, 10, 3) {} failed - stopping task
Thread-31::DEBUG::2012-05-03 14:44:37,149::task::1199::TaskManager.Task::(stop) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::stopping in state preparing (force False)
Thread-31::DEBUG::2012-05-03 14:44:37,149::task::978::TaskManager.Task::(_decref) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::ref 1 aborting True
Thread-31::INFO::2012-05-03 14:44:37,150::task::1157::TaskManager.Task::(prepare) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::aborting: Task is aborted: u'Cluster lock not locked for domain `79d5c099-dac2-44a6-9fb5-9a2f70da5388`, cannot release' - code 100
--- Additional comment from dron on 2012-05-03 08:29:21 EDT ---
I was testing a custom vdsm for Eduardo's patch:
http://gerrit.ovirt.org/#change,4085
--- Additional comment from fsimonce on 2012-05-03 08:34:51 EDT ---
The change introduced with patch 4085 (in practice) is refreshing the domain objects more frequently (now also within the same method even if we're holding a weakref). Therefore we strictly need a fix to make the clusterlock (safelease) stateless:
http://gerrit.ovirt.org/3497