Created attachment 761996 [details] logs Description of problem: after a failed extend (cannot initialize physical device) getSpmStatus triggers MD read error which triggers an spmStop and we move spm Version-Release number of selected component (if applicable): vdsm-4.10.2-23.0.el6ev.x86_64 sf18 How reproducible: 100% Steps to Reproduce: 1. create two iscsi domains on a two hosts cluster from luns located on storage server #1 2. storage tab -> select the non-master domain -> edit 3. log in to a lun from a storage server #2 4. block connectivity to storage server #2 from both hosts 5. press ok on the edit domain dialogue. Actual results: engine sends the extend domain which fails on cannot initialize physical device. after the extend fails, the DC will reinitialize. looking at the log getSpmStatus fails on MD read error: Storage domain does not exist and engine sends spmStop to the current spm Expected results: if extend fails it should not effect the pool unless there is a real issue in the current luns. Additional info: Thread-27696::INFO::2013-06-17 13:07:52,046::logUtils::40::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', options=None) Thread-27696::ERROR::2013-06-17 13:07:52,684::sdc::168::Storage.StorageDomainCache::(_findUnfetchedDomain) Error while looking for domain `7414f930-bbdb-4ec6-8132-4640cbb3c722` Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 163, in _findUnfetchedDomain return mod.findDomain(sdUUID) File "/usr/share/vdsm/storage/blockSD.py", line 1269, in findDomain return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID)) File "/usr/share/vdsm/storage/blockSD.py", line 394, in __init__ lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize)) File "/usr/share/vdsm/storage/lvm.py", line 973, in checkVGBlockSizes pvs = listPVNames(vgUUID) File "/usr/share/vdsm/storage/lvm.py", line 1262, in listPVNames vgPVs = [pv.name for pv in pvs if pv.vg_name == vgName] File "/usr/share/vdsm/storage/lvm.py", line 74, in __getattr__ raise AttributeError("Failed reload: %s" % self.name) AttributeError: Failed reload: /dev/mapper/1Dafna-mixed1371385 Thread-27696::ERROR::2013-06-17 13:07:52,685::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 7414f930-bbdb-4ec6-8132-4640cbb3c722 not found Traceback (most recent call last): File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 170, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('7414f930-bbdb-4ec6-8132-4640cbb3c722',) Thread-27696::ERROR::2013-06-17 13:07:52,685::hsm::614::Storage.HSM::(getSpmStatus) MD read error: Storage domain does not exist: ('7414f930-bbdb-4ec6-8132-4640cbb3c722',) Traceback (most recent call last): File "/usr/share/vdsm/storage/hsm.py", line 607, in getSpmStatus status = {'spmStatus': pool.spmRole, 'spmLver': pool.getSpmLver(), File "/usr/share/vdsm/storage/sp.py", line 136, in getSpmLver return self.getMetaParam(PMDK_LVER) File "/usr/share/vdsm/storage/sp.py", line 1515, in getMetaParam return self._metadata[key] File "/usr/share/vdsm/storage/sp.py", line 1335, in _metadata return self._getPoolMD(self.masterDomain) File "/usr/share/vdsm/storage/sp.py", line 1331, in _getPoolMD return DictValidator(domain._metadata._dict, SP_MD_FIELDS) File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__ return getattr(self.getRealDomain(), attrName) File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain return self._cache._realProduce(self._sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce domain = self._findDomain(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain dom = findMethod(sdUUID) File "/usr/share/vdsm/storage/sdc.py", line 170, in _findUnfetchedDomain raise se.StorageDomainDoesNotExist(sdUUID) StorageDomainDoesNotExist: Storage domain does not exist: ('7414f930-bbdb-4ec6-8132-4640cbb3c722',) Thread-27696::ERROR::2013-06-17 13:07:52,686::task::850::TaskManager.Task::(_setError) Task=`22cefb06-d7c9-450d-95db-a02a0c22a15b`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 857, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 41, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 615, in getSpmStatus raise se.StorageDomainMasterError("MD read error") StorageDomainMasterError: Error validating master storage domain: ('MD read error',) Thread-27696::DEBUG::2013-06-17 13:07:52,686::task::869::TaskManager.Task::(_run) Task=`22cefb06-d7c9-450d-95db-a02a0c22a15b`::Task._run: 22cefb06-d7c9-450d-95db-a02a0c22a15b ('7fd33b43-a9f4-4eb7-a885-e9583a929ceb',) {} failed - stopping task engine: 2013-06-17 13:07:48,492 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-57) Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',) 2013-06-17 13:07:49,205 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-57) [b5088a8] Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',) 2013-06-17 13:07:52,418 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-53) [3e7b6b31] Command ListVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused 2013-06-17 13:07:55,431 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-69) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused 2013-06-17 13:08:00,723 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-74) SPM Init: could not find reported vds or not up - pool:iS
After discussing with Haim, current behaviour is correct. closing