Created attachment 917078 [details] engine and vdsm logs Description of problem: We have an automated test that attempts to create attach and activate the first storage domain in a DC (iscsi storage domain) and fails on getSpmStatus with: Thread-13::INFO::2014-07-10 10:37:56,625::logUtils::47::dispatcher::(wrapper) Run and protect: connectStoragePool, Return response: True Thread-13::DEBUG::2014-07-10 10:37:56,625::task::1191::Storage.TaskManager.Task::(prepare) Task=`638d35cf-31d9-4436-8ba2-c99f93ad9fcd`::finished: True Thread-13::DEBUG::2014-07-10 10:37:56,625::task::595::Storage.TaskManager.Task::(_updateState) Task=`638d35cf-31d9-4436-8ba2-c99f93ad9fcd`::moving from state preparing -> state finished Thread-13::DEBUG::2014-07-10 10:37:56,626::resourceManager::940::Storage.ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-13::DEBUG::2014-07-10 10:37:56,626::resourceManager::977::Storage.ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-13::DEBUG::2014-07-10 10:37:56,626::task::993::Storage.TaskManager.Task::(_decref) Task=`638d35cf-31d9-4436-8ba2-c99f93ad9fcd`::ref 0 aborting False Thread-13::DEBUG::2014-07-10 10:37:56,672::BindingXMLRPC::298::vds::(wrapper) client [10.35.161.69] flowID [564b5dd0] Thread-13::DEBUG::2014-07-10 10:37:56,672::task::595::Storage.TaskManager.Task::(_updateState) Task=`0b81ee10-7f13-4667-a14a-345b3c059903`::moving from state init -> state preparing Thread-13::INFO::2014-07-10 10:37:56,672::logUtils::44::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID='31510a2b-e15b-4476-b120-8b8fb1db0400', options=None) Thread-13::ERROR::2014-07-10 10:37:56,673::task::866::Storage.TaskManager.Task::(_setError) Task=`0b81ee10-7f13-4667-a14a-345b3c059903`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 611, in getSpmStatus status = self._getSpmStatusInfo(pool) File "/usr/share/vdsm/storage/hsm.py", line 605, in _getSpmStatusInfo (pool.spmRole,) + pool.getSpmStatus())) File "/usr/share/vdsm/storage/sp.py", line 126, in getSpmStatus return self._backend.getSpmStatus() File "/usr/share/vdsm/storage/spbackends.py", line 416, in getSpmStatus lVer, spmId = self.masterDomain.inquireClusterLock() File "/usr/share/vdsm/storage/sd.py", line 511, in inquireClusterLock return self._clusterLock.inquire() File "/usr/share/vdsm/storage/clusterlock.py", line 119, in inquire raise InquireNotSupportedError() InquireNotSupportedError Version-Release number of selected component (if applicable): How reproducible: ? - seems to reproduce 100% on this specific automated test (but so far *only* on this test) Steps to Reproduce: 1. Create new DC, Cluster, Add new host 2. Add new iscsi storage domain to DC Actual results: Fails with error listed above Expected results: Should succeed Additional info:
The issue happens when you try to create a data center 3.5 using a master domain V1 (domVersion='0' in vdsm). It shouldn't impact more than just a single test (create data center 3.5 with master domain V1). On regular basis you should create any data center >= 3.1 using a master domain V3. Relevant logs: Thread-13::INFO::2014-07-15 09:25:19,376::logUtils::44::dispatcher::(wrapper) Run and protect: createStorageDomain(storageType=3, sdUUID='1c482ce1-fd64-4601-8687-a5ba4dcbf3c4', domainName='iscsi_0', typeSpecificArg='FuDRBH-I4ME-BLI0-55cy-lV7k-OyB0-bdSSb4', domClass=1, domVersion='0', options=None) Thread-14::INFO::2014-07-15 09:25:26,872::logUtils::44::dispatcher::(wrapper) Run and protect: createStoragePool(poolType=None, spUUID='50840a07-b2eb-4486-bde3-f8e6e3592676', poolName='datacenter_async_tasks', masterDom='1c482ce1-fd64-4601-8687-a5ba4dcbf3c4', domList=['1c482ce1-fd64-4601-8687-a5ba4dcbf3c4'], masterVersion=1, lockPolicy=None, lockRenewalIntervalSec=5, leaseTimeSec=60, ioOpTimeoutSec=10, leaseRetries=3, options=None) Thread-14::INFO::2014-07-15 09:25:49,139::logUtils::44::dispatcher::(wrapper) Run and protect: connectStoragePool(spUUID='50840a07-b2eb-4486-bde3-f8e6e3592676', hostID=1, msdUUID='1c482ce1-fd64-4601-8687-a5ba4dcbf3c4', masterVersion=1, domainsMap={'1c482ce1-fd64-4601-8687-a5ba4dcbf3c4': 'active'}, options=None) Thread-14::INFO::2014-07-15 09:25:49,624::logUtils::44::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID='50840a07-b2eb-4486-bde3-f8e6e3592676', options=None) Thread-14::ERROR::2014-07-15 09:25:49,624::task::866::Storage.TaskManager.Task::(_setError) Task=`7c9416c5-81f5-4f93-97d6-30fd003fe869`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 873, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 45, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 611, in getSpmStatus status = self._getSpmStatusInfo(pool) File "/usr/share/vdsm/storage/hsm.py", line 605, in _getSpmStatusInfo (pool.spmRole,) + pool.getSpmStatus())) File "/usr/share/vdsm/storage/sp.py", line 126, in getSpmStatus return self._backend.getSpmStatus() File "/usr/share/vdsm/storage/spbackends.py", line 416, in getSpmStatus lVer, spmId = self.masterDomain.inquireClusterLock() File "/usr/share/vdsm/storage/sd.py", line 511, in inquireClusterLock return self._clusterLock.inquire() File "/usr/share/vdsm/storage/clusterlock.py", line 119, in inquire raise InquireNotSupportedError() InquireNotSupportedError
(In reply to Federico Simoncelli from comment #1) > It shouldn't impact more than just a single test (create data center 3.5 > with master domain V1). Removing AutomationBlocker based on this.
Shouldn't this test be removed when it is not important to test this feature?
The flow of creating a domain and then creating a pool on top of it is a valid flow, and should be tested. What should be removed is the V1 test (which we'll handle in bug 1120712)
Moved to ON_QA as bug 1120712 is already merged.
verified
Bug 1120712 is not a complete fix for this.
Impact of this is: you cannot create a new Data Center 3.5 using as first master domain a Storage Domain < V3. Bug 1120712 mitigated the issue but the problem is still present. Considering that the fix would involve some complex logic that we'll throw away soon I am not sure if we want to fix this (ever).
(In reply to Federico Simoncelli from comment #1) > The issue happens when you try to create a data center 3.5 using a master > domain V1 (domVersion='0' in vdsm). > > It shouldn't impact more than just a single test (create data center 3.5 > with master domain V1). Gil I mentioned that we should have had an automated test failing on this for 3.5, the bug couldn't have been VERFIED. This scenario must be tested for data centers < 3.5. Do you want to review together the matrix for the automated tests?
(In reply to Federico Simoncelli from comment #9) > Impact of this is: you cannot create a new Data Center 3.5 using as first > master domain a Storage Domain < V3. > > Bug 1120712 mitigated the issue but the problem is still present. > > Considering that the fix would involve some complex logic that we'll throw > away soon I am not sure if we want to fix this (ever). Reducing priority since you'd have to explicitly create a V1 domain for this, and pushing out to 3.5.1 while we rethink if this is even worth the effort to fix.
Federico, could you please email me your suggestion for the testing matrix? I'll pull in the relevant people from QE, and see if/when we could cover this in automation.
*** Bug 1139401 has been marked as a duplicate of this bug. ***
*** Bug 1166066 has been marked as a duplicate of this bug. ***
Re-targeting to 3.5.3 since this bug has not been marked as blocker for 3.5.2 and we have already released 3.5.2 Release Candidate.
This bug report has Keywords: Regression or TestBlocker. Since no regressions or test blockers are allowed between releases, it is also being identified as a blocker for this release. Please resolve ASAP.
Can you please check if this is fixed and how to test this?
Ala, what patch solves this? How come it's on MODIFIED?
bug is missing patch in external tracker, can you please add the relevant fix, otherwise its impossible to match / verify if the fix is really in.
Verified on vdsm-4.17.10.1-0.el7ev.noarch . Followed the steps to reproduce
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2016-0362.html