Bug 818632 - [vdsm-rhevm] vdsm: Cluster lock not locked for domain when running reconstructMaster
[vdsm-rhevm] vdsm: Cluster lock not locked for domain when running reconstruc...
Status: CLOSED DUPLICATE of bug 818577
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: vdsm (Show other bugs)
6.3
x86_64 Linux
urgent Severity urgent
: rc
: ---
Assigned To: Federico Simoncelli
Haim
storage
: Regression, TestBlocker
Depends On: 818577
Blocks:
  Show dependency treegraph
 
Reported: 2012-05-03 11:25 EDT by Dafna Ron
Modified: 2014-01-12 19:51 EST (History)
11 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 818577
Environment:
Last Closed: 2012-05-09 05:52:48 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Dafna Ron 2012-05-03 11:25:00 EDT
this was also reproduced on vdsm-4.9.6-9.el6.x86_64 on build si3

Patch already exists and tested. 



+++ This bug was initially created as a clone of Bug #818577 +++

Created attachment 581846 [details]
log

Description of problem:

checking a patch I put master domain in maintenance I got a wrong master domain ERROR on spm. 
looking at logs we see the below: 

Exception: Cluster lock not locked for domain `79d5c099-dac2-44a6-9fb5-9a2f70da5388`, cannot release


Version-Release number of selected component (if applicable):

How reproducible:

100%

Steps to Reproduce:
1. migrate master domain to second domain
2.
3.
  
Actual results:

we get wrong master domain 

Expected results:

we should not get wrong master domain 

Additional info:

Thread-31::ERROR::2012-05-03 14:44:37,147::task::853::TaskManager.Task::(_setError) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 861, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 1471, in reconstructMaster
    return pool.reconstructMaster(poolName, masterDom, domDict, masterVersion, safeLease)
  File "/usr/share/vdsm/storage/sp.py", line 772, in reconstructMaster
    self._releaseTemporaryClusterLock(msdUUID)
  File "/usr/share/vdsm/storage/sp.py", line 523, in _releaseTemporaryClusterLock
    msd.releaseClusterLock()
  File "/usr/share/vdsm/storage/sd.py", line 418, in releaseClusterLock
    self._clusterLock.release()
  File "/usr/share/vdsm/storage/safelease.py", line 111, in release
    raise Exception("Cluster lock not locked for domain `%s`, cannot release" % self._sdUUID)
Exception: Cluster lock not locked for domain `79d5c099-dac2-44a6-9fb5-9a2f70da5388`, cannot release
Thread-31::DEBUG::2012-05-03 14:44:37,148::task::872::TaskManager.Task::(_run) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::Task._run: 7c9afe3e-b3fe-4f61-9d22-374a6977d3ba ('63dbb4ef-10e8-43ce-98bd-acfb46fccfbf', 'rhevm-iscsi', '79d5c099-dac2-44a6-9fb5-9a2f70da5388', {'10641fc4-3ad2-4538-b25b-6aa6e5c431b7': 'active', '79d5c099-dac2-44a6-9fb5-9a2f70da5388': 'active'}, 5, None, 5, 60, 10, 3) {} failed - stopping task
Thread-31::DEBUG::2012-05-03 14:44:37,149::task::1199::TaskManager.Task::(stop) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::stopping in state preparing (force False)
Thread-31::DEBUG::2012-05-03 14:44:37,149::task::978::TaskManager.Task::(_decref) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::ref 1 aborting True
Thread-31::INFO::2012-05-03 14:44:37,150::task::1157::TaskManager.Task::(prepare) Task=`7c9afe3e-b3fe-4f61-9d22-374a6977d3ba`::aborting: Task is aborted: u'Cluster lock not locked for domain `79d5c099-dac2-44a6-9fb5-9a2f70da5388`, cannot release' - code 100

--- Additional comment from dron@redhat.com on 2012-05-03 08:29:21 EDT ---

I was testing a custom vdsm for Eduardo's patch:
http://gerrit.ovirt.org/#change,4085

--- Additional comment from fsimonce@redhat.com on 2012-05-03 08:34:51 EDT ---

The change introduced with patch 4085 (in practice) is refreshing the domain objects more frequently (now also within the same method even if we're holding a weakref). Therefore we strictly need a fix to make the clusterlock (safelease) stateless:

http://gerrit.ovirt.org/3497
Comment 2 Dan Kenigsberg 2012-05-09 05:52:48 EDT

*** This bug has been marked as a duplicate of bug 818577 ***

Note You need to log in before you can comment on or make changes to this bug.