Created attachment 557007 [details] engine, vdsm log Description of problem: I have two hosts in iscsi DC with two domains, each on different server. I put hsm to maintenance and on spm drop connection to master storage. After new master was selected and former master went to inactive I unblocked the network connection between spm and storage. Then I activated the storage domain that had network troubles. So now there is one host SPM active, one host in maintenance and two storages active. I activated the host in maintenance but it fails to connect to storage pool due to master versions mismatch: On SPM: [root@srh-03 ~]# vdsClient -s 0 getStorageDomainInfo cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 uuid = cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 vguuid = ZcADor-FO4W-e15W-3eXs-kIbJ-YlWi-kc3rN8 lver = 0 state = OK version = 2 role = Master pool = ['29eedc70-5b61-4f20-8b64-ea4a1fb4b48c'] spm_id = 2 type = ISCSI class = Data master_ver = 13 name = str01-jlibosva2 Backend: 2012-01-23 17:09:18,966 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-31) START, ConnectStoragePoolVDSCommand(vdsId = 29291ea0-41ac-11e1-886e-001a4a013f0a, storagePoolId = 29eedc70-5b61-4f20-8b64-ea4a1fb4b48c, vds_spm_id = 1, masterDomainId = cc1ff6aa-3320-4f18-a5e9-a68b6db70f23, masterVersion = 13), log id: 1fc7b8bc vdsm: Thread-363::ERROR::2012-01-23 17:07:37,872::sp::1449::Storage.StoragePool::(getMasterDomain) Requested master domain cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 does not have expected version 13 it is version 10 Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::535::ResourceManager::(releaseResource) Trying to release resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::550::ResourceManager::(releaseResource) Released resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' (0 active users) Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::555::ResourceManager::(releaseResource) Resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' is free, finding out if anyone is waiting for it. Thread-363::DEBUG::2012-01-23 17:07:37,874::resourceManager::562::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c', Clearing records. Thread-363::ERROR::2012-01-23 17:07:37,874::task::855::TaskManager.Task::(_setError) Task=`8771aa62-2bc6-4f8a-be4a-55dfd808c5bd`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 863, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 721, in connectStoragePool return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options) File "/usr/share/vdsm/storage/hsm.py", line 763, in _connectStoragePool res = pool.connect(hostID, scsiKey, msdUUID, masterVersion) File "/usr/share/vdsm/storage/sp.py", line 624, in connect self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1097, in __rebuild self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion) File "/usr/share/vdsm/storage/sp.py", line 1450, in getMasterDomain raise se.StoragePoolWrongMaster(self.spUUID, msdUUID) StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=cc1ff6aa-3320-4f18-a5e9-a68b6db70f23, pool=29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' Version-Release number of selected component (if applicable): vdsm-4.9.3.1-0.fc16.x86_64 How reproducible: Always Steps to Reproduce: Please see description Actual results: Host goes to Non Operational Expected results: Host is connected and operational Additional info: Backend and vdsm log attached
vdsm restart fixes the problem
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.