Bug 853710
Summary: | 3.1 - [vdsm] deactivateStorageDomain fails due to "storage domain does not exist" (Problem with handler, treating as timeout) | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Gadi Ickowicz <gickowic> | ||||
Component: | vdsm | Assignee: | Federico Simoncelli <fsimonce> | ||||
Status: | CLOSED ERRATA | QA Contact: | Dafna Ron <dron> | ||||
Severity: | urgent | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3 | CC: | abaron, amureini, bazulay, chetan, fsimonce, iheim, ilvovsky, jbiddle, jlibosva, lpeer, nlevinki, smizrahi, ykaul | ||||
Target Milestone: | rc | Keywords: | Regression, TestBlocker, ZStream | ||||
Target Release: | 6.4 | ||||||
Hardware: | All | ||||||
OS: | All | ||||||
Whiteboard: | storage | ||||||
Fixed In Version: | vdsm-4.9.6-39.0 | Doc Type: | Bug Fix | ||||
Doc Text: |
When an ISO domain was blocked and deactivateStorageDomain was sent, it failed to deactivate the storage domain and showed a "storage domain does not exist" error. Due to this, the domain never switched to inactive, and instead cycled between locked and active in the engine. This patch removes a now redundant cache that caused the error, allowing domains to switch to inactive successfully.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-12-04 19:09:17 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
(In reply to comment #0) > Created attachment 609044 [details] > vdsm + engine logs > > Description of problem: > When an iso domain is blocked and deactivateStorageDomain is sent, it fails > to deactivate the storage domain with "storage domain does not exist" error. > > Due to this, the domain never switches to inactive, and cycles between > locked and active in engine. > > Thread-776::ERROR::2012-08-30 > 13:43:16,192::task::853::TaskManager.Task::(_setError) > Task=`88337d82-334f-4623-86d0-fb589b2e33a9`::Unexpected error > Traceback (most recent call last): > File "/usr/share/vdsm/storage/task.py", line 861, in _run > return fn(*args, **kargs) > File "/usr/share/vdsm/logUtils.py", line 38, in wrapper > res = f(*args, **kwargs) > File "/usr/share/vdsm/storage/hsm.py", line 988, in deactivateStorageDomain > pool.deactivateSD(sdUUID, msdUUID, masterVersion) > File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper > return f(self, *args, **kwargs) > File "/usr/share/vdsm/storage/sp.py", line 1103, in deactivateSD > masterDir = os.path.join(dom.domaindir, sd.MASTER_FS_DIR) > File "/usr/share/vdsm/storage/sdc.py", line 47, in __getattr__ > dom = self.getRealDomain() > File "/usr/share/vdsm/storage/sdc.py", line 51, in getRealDomain > return self._cache._realProduce(self._sdUUID) > File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce > dom = self._findDomain(sdUUID) > File "/usr/share/vdsm/storage/sdc.py", line 147, in _findDomain > raise se.StorageDomainDoesNotExist(sdUUID) > StorageDomainDoesNotExist: Storage domain does not exist: > ('4e16961e-b86d-403b-8304-d3b5e8e409ff',) > > > during this time, repoStats continues to report the domain as valid:False, > causing the continuous cycle back to locked: > > Thread-785::INFO::2012-08-30 > 13:43:26,877::logUtils::39::dispatcher::(wrapper) Run and protect: > repoStats, Return response: {'4e16961e-b86d-403b-8304-d3b5e8e409ff': > {'delay': '0', 'lastCheck': 1346322697.6894541, 'code': 200, 'valid': > False}, '81c9e011-46ac-4b1a-bb72-b22d0de3e6bd': {'delay': '0.015328168869', > 'lastCheck': 1346323396.2141991, 'code': 0, 'valid': True}} This exception is causing a timeout error in CrabRPC. Relevant log lines: hread-459::WARNING::2012-08-30 13:36:49,390::remoteFileHandler::185::Storage.CrabRPCProxy::(callCrabRPCFunction) Problem with handler, treating as timeout Traceback (most recent call last): File "/usr/share/vdsm/storage/remoteFileHandler.py", line 177, in callCrabRPCFunction rawLength = self._recvAll(LENGTH_STRUCT_LENGTH, timeout) File "/usr/share/vdsm/storage/remoteFileHandler.py", line 143, in _recvAll raise Timeout() Timeout Thread-458::ERROR::2012-08-30 13:36:49,391::task::853::TaskManager.Task::(_setError) Task=`2b78c9e2-bec9-4e87-8b74-d842225bc0f4`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 988, in deactivateStorageDomain pool.deactivateSD(sdUUID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1103, in deactivateSD masterDir = os.path.join(dom.domaindir, sd.MASTER_FS_DIR) File "/usr/share/vdsm/storage/sdc.py", line 47, in __getattr__ dom = self.getRealDomain() File "/usr/share/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce dom = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 147, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('4e16961e-b86d-403b-8304-d3b5e8e409ff',) > > Version-Release number of selected component (if applicable): > 4.9-31.0 > > How reproducible: > > > Steps to Reproduce: > 1. Block an nfs iso storage domain > 2. try to deactivate the storage domain > 3. check logs / query engine for storage domain status > > Actual results: > storage domain status cycles between active and locked > > Expected results: > storage domain should be inactive > > Additional info: The code assumes that os.path.join cannot fail without considering that dom.domaindir could: else: masterDir = os.path.join(dom.domaindir, sd.MASTER_FS_DIR) this is probably due to the move to domain proxies. Fede? If so, we could have this problem in multiple places in the code. Tested with http://gerrit.ovirt.org/#/c/7511/ Domain now deactivates successfully *** Bug 854295 has been marked as a duplicate of this bug. *** (In reply to comment #5) > Tested with http://gerrit.ovirt.org/#/c/7511/ > > Domain now deactivates successfully Moving to POST according to this. clearing need info per C8. verified on si22.1 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html |
Created attachment 609044 [details] vdsm + engine logs Description of problem: When an iso domain is blocked and deactivateStorageDomain is sent, it fails to deactivate the storage domain with "storage domain does not exist" error. Due to this, the domain never switches to inactive, and cycles between locked and active in engine. Thread-776::ERROR::2012-08-30 13:43:16,192::task::853::TaskManager.Task::(_setError) Task=`88337d82-334f-4623-86d0-fb589b2e33a9`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 988, in deactivateStorageDomain pool.deactivateSD(sdUUID, msdUUID, masterVersion) File "/usr/share/vdsm/storage/securable.py", line 63, in wrapper return f(self, *args, **kwargs) File "/usr/share/vdsm/storage/sp.py", line 1103, in deactivateSD masterDir = os.path.join(dom.domaindir, sd.MASTER_FS_DIR) File "/usr/share/vdsm/storage/sdc.py", line 47, in __getattr__ dom = self.getRealDomain() File "/usr/share/vdsm/storage/sdc.py", line 51, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 123, in _realProduce dom = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 147, in _findDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('4e16961e-b86d-403b-8304-d3b5e8e409ff',) during this time, repoStats continues to report the domain as valid:False, causing the continuous cycle back to locked: Thread-785::INFO::2012-08-30 13:43:26,877::logUtils::39::dispatcher::(wrapper) Run and protect: repoStats, Return response: {'4e16961e-b86d-403b-8304-d3b5e8e409ff': {'delay': '0', 'lastCheck': 1346322697.6894541, 'code': 200, 'valid': False}, '81c9e011-46ac-4b1a-bb72-b22d0de3e6bd': {'delay': '0.015328168869', 'lastCheck': 1346323396.2141991, 'code': 0, 'valid': True}} Version-Release number of selected component (if applicable): 4.9-31.0 How reproducible: Steps to Reproduce: 1. Block an nfs iso storage domain 2. try to deactivate the storage domain 3. check logs / query engine for storage domain status Actual results: storage domain status cycles between active and locked Expected results: storage domain should be inactive Additional info: