| Summary: | vdsm: hsm becomes non-operational after activation if changes were made to master domain or its version while host was in maintenance | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Dafna Ron <dron> | ||||
| Component: | vdsm | Assignee: | Eduardo Warszawski <ewarszaw> | ||||
| Status: | CLOSED ERRATA | QA Contact: | Jakub Libosvar <jlibosva> | ||||
| Severity: | urgent | Docs Contact: | |||||
| Priority: | urgent | ||||||
| Version: | 6.3 | CC: | abaron, aburden, bazulay, cpelland, danken, hateya, iheim, ilvovsky, jbiddle, jlibosva, ykaul | ||||
| Target Milestone: | rc | ||||||
| Target Release: | --- | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Unspecified | ||||||
| Whiteboard: | storage | ||||||
| Fixed In Version: | vdsm-4.9.6-10 | Doc Type: | Bug Fix | ||||
| Doc Text: |
Previously, due to an issue with pool metadata not refreshing correctly, attempting to put the HSM host into maintenance mode while reconstructing the master domain would result in the changes not being updated in the HSM. The pool metadata issue has been solved so that any changes and updates applied to the HSM in maintenance mode will be retained when it is reactivated.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2012-12-04 18:56:56 UTC | Type: | --- | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Attachments: |
|
||||||
Created attachment 573357 [details]
logs
The pool metadata is not refreshed due to an sdc issue. (HSM) In addition: The host reach this situation after a lot of misleading operations of the engine, like disconnect the storage and try to connect the pool. The 2nd SD in the pool, 78627e5c-87f7-4492-bc41-a832c5955492 was unreacheable over the whole log. In spite of that was choosed as master and was attempted to connect this HSM to a this unreacheable master. The flow should be revised. http://gerrit.ovirt.org/#change,4085 Verified using vdsm-4.9.6-10.el6.x86_64 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-1508.html |
Description of problem: if we put the hsm host in maintenance while we reconstruct master then the changes are not updated in hsm this is caused by simply putting the master domain in maintenance while hsm is also in maintenance if master domain is the same domain as before (which means version changes) the version is also not updated and the hsm will get wrong master domain or version. backend is sending disconnectStorageServer so domain should be disconnected Version-Release number of selected component (if applicable): vdsm-4.9.6-4.5.x86_64 How reproducible: 100% Steps to Reproduce: 1. in two hosts cluster add two storage domains 2. put hsm in maintenance and put the master domain in maintenance as well (so that the second domain will become master) 3. activate the hsm -> host will become non-operational with can't find master domain error 4. put the hsm in maintenance again 5. put the new master domain in maintenance so that the old domain will become master again 6. activate the hsm -> host will become non-operational with wrong master version error Actual results: hsm is not updated with changes made to master domain while its disconnected from pool. when we activate the hsm and the master has changed to different location we get can't find master and when the version has changed (if we put master in maintenance) we will get wrong version Expected results: hsm should be updated with changes when activated. Additional info: will attach full logs from both hosts and backend Thread-350::INFO::2012-03-28 15:11:09,899::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStoragePool(spUUID='8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', hostID=2, scsiKey='8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', remove=False, options=None) Thread-351::INFO::2012-03-28 15:11:09,954::logUtils::37::dispatcher::(wrapper) Run and protect: disconnectStorageServer(domType=3, spUUID='8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', conList=[{'connection': '10.35.64.106', 'iqn': 'iqn.1986-03.com.sun:02:dafna112713222714816', 'portal': '1', 'user': '', 'password': '******', 'id': '7c2518d4-f9b6-493f-b32e-fcf1668264b3', 'port': '3260'}, {'connection': '10.35.64.10', 'iqn': 'Dafna-big', 'portal': '1', 'user': '', 'password': '******', 'id': 'b1f1a8b2-f42d-4a33-8bf5-80f39f3d04a7', 'port': '3260'}], options=None) Thread-347::ERROR::2012-03-28 15:07:51,354::sp::1456::Storage.StoragePool::(getMasterDomain) Requested master domain 83a46a9e-dac2-4513-bb21-a33ff76a495a does not have expected version 3 it is version 1 Thread-347::DEBUG::2012-03-28 15:07:51,355::resourceManager::538::ResourceManager::(releaseResource) Trying to release resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5' Thread-347::DEBUG::2012-03-28 15:07:51,355::resourceManager::553::ResourceManager::(releaseResource) Released resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5' (0 active u sers) Thread-347::DEBUG::2012-03-28 15:07:51,356::resourceManager::558::ResourceManager::(releaseResource) Resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5' is free, finding out if anyone is waiting for it. Thread-347::DEBUG::2012-03-28 15:07:51,356::resourceManager::565::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.8ed78e50-b61e-4b84-a5b7-7c17f76f16a5 ', Clearing records. Thread-347::ERROR::2012-03-28 15:07:51,357::task::853::TaskManager.Task::(_setError) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 813, in connectStoragePool return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 855, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 641, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1107, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1457, in getMasterDomain raise se.StoragePoolWrongMaster(self.spUUID, msdUUID) StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=83a46a9e-dac2-4513-bb21-a33ff76a495a, pool=8ed78e50-b61e-4b84-a5b7-7c17f76f16a5' Thread-347::DEBUG::2012-03-28 15:07:51,358::task::872::TaskManager.Task::(_run) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::Task._run: 8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13 ('8e d78e50-b61e-4b84-a5b7-7c17f76f16a5', 2, '8ed78e50-b61e-4b84-a5b7-7c17f76f16a5', '83a46a9e-dac2-4513-bb21-a33ff76a495a', 3) {} failed - stopping task Thread-347::DEBUG::2012-03-28 15:07:51,358::task::1199::TaskManager.Task::(stop) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::stopping in state preparing (force False) Thread-347::DEBUG::2012-03-28 15:07:51,359::task::978::TaskManager.Task::(_decref) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::ref 1 aborting True Thread-347::INFO::2012-03-28 15:07:51,359::task::1157::TaskManager.Task::(prepare) Task=`8c7b0f3f-4a37-403b-8d67-4ab0f0b44b13`::aborting: Task is aborted: 'Wrong Master domain o r its version' - code 324