Description of problem: remove DC should only be used if we have the master domain attached (since it cannot be detached without removing the pool). we currently can remove the DC when there is more then one domain attached. I had several races in which the remove fails in backend and not in vds which causes the error "domain not in pool in vds" However, setting this bug on urgent is because once I created a new DC and attached the removed domains, hsm is getting wrong master or version error and becomes non-operational (yes, HSM not SPM). SPM remains active and connected to pool and all domains. Version-Release number of selected component (if applicable): ovirt-engine-backend-3.1.0_0001-3.el6.x86_64 vdsm-4.9.6-4.5.x86_64 How reproducible: 100% Steps to Reproduce: 1. create and attach several iscsi domains to a two host cluster 2. put all domains in maintenance and remove the DC 3. create a second DC and reattach all the domains to the new DC Actual results: we get a wrong master domain error from hsm only and host becomes non-operational Expected results: we should not be allowed to remove the DC while there is more then the master domain attached Additional info: full logs attached. 2012-03-26 16:55:02,486 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-22) [4c105e7c] Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 324 mMessage Wrong Master domain or its version: 'SD=4b1984e6-feb6-4629-9e13-affda9c9ca41, pool=e4269e26-f1ac-4dfb-8407-8e191d736a05' Thread-942::ERROR::2012-03-26 16:47:50,783::task::853::TaskManager.Task::(_setError) Task=`7345bbab-5a07-4072-8069-c5df986ed3d9`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 861, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 813, in connectStoragePool return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 855, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 641, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1107, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1451, in getMasterDomain raise se.StoragePoolWrongMaster(self.spUUID, msdUUID) StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=4b1984e6-feb6-4629-9e13-affda9c9ca41, pool=e4269e26-f1ac-4dfb-8407-8e191d736a05'
Created attachment 572782 [details] logs
"works for me" I could not reproduce the issue with the latest git version from both vdsm and engine. I used two iscsi storages. (how much is several exaclty?)
Dafna, I could not reproduce this issue do far, can you help? Thx.
Created attachment 601939 [details] reproduced again I reproduce on si12 again. * make sure you are working with correct packages and with rhel hosts only. * make sure that you have iscsi domains * make sure that you have no objects under any of the domains (lean setup). * make sure that you have only 1 host in up state.
trying to reproduce. after trying to remove the DC, it failed to detach master domain and the DC got back to Up state
- the DC could not be removed with a simple 'remove', the master domain got stuck in locked state and the DC got back to 'Up' state, all domains inactivated, only the master Locked - this might be a bug - with force remove, removing the DC worked fine, after that attaching the storage generated some errors, but at the end it managed to start up 2012-08-24 16:22:23,671 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (http--0.0.0.0-8080-2) [38c457c] starting spm on vds dev-164, storage pool iscsi02, prevId -1, LVER -1 2012-08-24 16:22:23,673 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http--0.0.0.0-8080-2) [38c457c] START, SpmStartVDSCommand(vdsId = 73261da6-edf4-11e1-955d-b3bda0fd89ee, storagePoolId = 8fa4ef1a-c514-47be-bfe1-004e09cdec23, prevId=-1, prevLVER=-1, storagePoolFormatType=V3, recoveryMode=Manual, SCSIFencing=false), log id: 6fa7ed3 2012-08-24 16:22:23,681 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http--0.0.0.0-8080-2) [38c457c] spmStart polling started: taskId = 064081b6-0603-44c6-b4dd-44b23eda8ba9 2012-08-24 16:22:24,693 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http--0.0.0.0-8080-2) [38c457c] Failed in HSMGetTaskStatusVDS method 2012-08-24 16:22:24,695 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (http--0.0.0.0-8080-2) [38c457c] Error code GeneralException and error message VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = 'wait' is an invalid keyword argument for this function 2012-08-24 16:22:24,697 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http--0.0.0.0-8080-2) [38c457c] spmStart polling ended: taskId = 064081b6-0603-44c6-b4dd-44b23eda8ba9 task status = finished 2012-08-24 16:22:24,698 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http--0.0.0.0-8080-2) [38c457c] Start SPM Task failed - result: cleanSuccess, message: VDSGenericException: VDSErrorException: Failed to HSMGetTaskStatusVDS, error = 'wait' is an invalid keyword argument for this function 2012-08-24 16:22:24,707 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.SpmStartVDSCommand] (http--0.0.0.0-8080-2) [38c457c] spmStart polling ended. spm status: Free 2012-08-24 16:22:24,710 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMClearTaskVDSCommand] (http--0.0.0.0-8080-2) [38c457c] START, HSMClearTaskVDSCommand(vdsId = 73261da6-edf4-11e1-955d-b3bda0fd89ee, taskId=064081b6-0603-44c6-b4dd-44b23eda8ba9), log id: 5eb97f68
related vdsm log: Thread-2101::DEBUG::2012-08-24 16:52:03,767::taskManager::96::TaskManager::(getTaskStatus) Return. Response: {'code': 100, 'message': u"'wait' is an invalid keyword argument for this function", 'taskState': 'fi nished', 'taskResult': 'cleanSuccess', 'taskID': 'e481fe99-0a11-4281-b316-a65ccae29528'} Thread-2101::INFO::2012-08-24 16:52:03,767::logUtils::39::dispatcher::(wrapper) Run and protect: getTaskStatus, Return response: {'taskStatus': {'code': 100, 'message': u"'wait' is an invalid keyword argument f or this function", 'taskState': 'finished', 'taskResult': 'cleanSuccess', 'taskID': 'e481fe99-0a11-4281-b316-a65ccae29528'}} Thread-2101::DEBUG::2012-08-24 16:52:03,767::task::1151::TaskManager.Task::(prepare) Task=`de9dbdf4-8105-418c-a094-b1446e1f4508`::finished: {'taskStatus': {'code': 100, 'message': u"'wait' is an invalid keyword argument for this function", 'taskState': 'finished', 'taskResult': 'cleanSuccess', 'taskID': 'e481fe99-0a11-4281-b316-a65ccae29528'}} Thread-2101::DEBUG::2012-08-24 16:52:03,767::task::568::TaskManager.Task::(_updateState) Task=`de9dbdf4-8105-418c-a094-b1446e1f4508`::moving from state preparing -> state finished Thread-2101::DEBUG::2012-08-24 16:52:03,767::resourceManager::809::ResourceManager.Owner::(releaseAll) Owner.releaseAll requests {} resources {} Thread-2101::DEBUG::2012-08-24 16:52:03,767::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/storage/task.py", line 307, in run return self.cmd(*self.argslist, **self.argsdict) File "/usr/share/vdsm/storage/sp.py", line 250, in startSpm self.masterDomain.acquireHostId(self.id) File "/usr/share/vdsm/storage/sd.py", line 427, in acquireHostId self._clusterLock.acquireHostId(hostId, async) File "/usr/share/vdsm/storage/safelease.py", line 170, in acquireHostId self._sdUUID, hostId, self._idsPath, wait=True): TypeError: 'wait' is an invalid keyword argument for this function
http://gerrit.ovirt.org/7509
verified on si20