Bug 974991

Summary: vdsm: after failed domain extend getSpmStatus triggers MD read error and we change spm
Product: Red Hat Enterprise Virtualization Manager Reporter: Dafna Ron <dron>
Component: vdsmAssignee: Nobody's working on this, feel free to take it <nobody>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.2.0CC: abaron, acanan, bazulay, hateya, iheim, jkt, lpeer
Target Milestone: ---Keywords: Triaged
Target Release: 3.3.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-10 08:50:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
logs none

Description Dafna Ron 2013-06-17 10:26:48 UTC
Created attachment 761996 [details]
logs

Description of problem:

after a failed extend (cannot initialize physical device) getSpmStatus triggers MD read error which triggers an spmStop and we move spm

Version-Release number of selected component (if applicable):

vdsm-4.10.2-23.0.el6ev.x86_64
sf18

How reproducible:

100%

Steps to Reproduce:
1. create two iscsi domains on a two hosts cluster from luns located on storage server #1
2. storage tab -> select the non-master domain -> edit
3. log in to a lun from a storage server #2
4. block connectivity to storage server #2 from both hosts 
5. press ok on the edit domain dialogue. 

Actual results:

engine sends the extend domain which fails on cannot initialize physical device. 
after the extend fails, the DC will reinitialize. 
looking at the log getSpmStatus fails on MD read error: Storage domain does not exist and engine sends spmStop to the current spm

Expected results:

if extend fails it should not effect the pool unless there is a real issue in the current luns. 


Additional info:

Thread-27696::INFO::2013-06-17 13:07:52,046::logUtils::40::dispatcher::(wrapper) Run and protect: getSpmStatus(spUUID='7fd33b43-a9f4-4eb7-a885-e9583a929ceb', options=None)

Thread-27696::ERROR::2013-06-17 13:07:52,684::sdc::168::Storage.StorageDomainCache::(_findUnfetchedDomain) Error while looking for domain `7414f930-bbdb-4ec6-8132-4640cbb3c722`
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 163, in _findUnfetchedDomain
    return mod.findDomain(sdUUID)
  File "/usr/share/vdsm/storage/blockSD.py", line 1269, in findDomain
    return BlockStorageDomain(BlockStorageDomain.findDomainPath(sdUUID))
  File "/usr/share/vdsm/storage/blockSD.py", line 394, in __init__
    lvm.checkVGBlockSizes(sdUUID, (self.logBlkSize, self.phyBlkSize))
  File "/usr/share/vdsm/storage/lvm.py", line 973, in checkVGBlockSizes
    pvs = listPVNames(vgUUID)
  File "/usr/share/vdsm/storage/lvm.py", line 1262, in listPVNames
    vgPVs = [pv.name for pv in pvs if pv.vg_name == vgName]
  File "/usr/share/vdsm/storage/lvm.py", line 74, in __getattr__
    raise AttributeError("Failed reload: %s" % self.name)
AttributeError: Failed reload: /dev/mapper/1Dafna-mixed1371385
Thread-27696::ERROR::2013-06-17 13:07:52,685::sdc::143::Storage.StorageDomainCache::(_findDomain) domain 7414f930-bbdb-4ec6-8132-4640cbb3c722 not found
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 170, in _findUnfetchedDomain
    raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('7414f930-bbdb-4ec6-8132-4640cbb3c722',)
Thread-27696::ERROR::2013-06-17 13:07:52,685::hsm::614::Storage.HSM::(getSpmStatus) MD read error: Storage domain does not exist: ('7414f930-bbdb-4ec6-8132-4640cbb3c722',)
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 607, in getSpmStatus
    status = {'spmStatus': pool.spmRole, 'spmLver': pool.getSpmLver(),
  File "/usr/share/vdsm/storage/sp.py", line 136, in getSpmLver
    return self.getMetaParam(PMDK_LVER)
  File "/usr/share/vdsm/storage/sp.py", line 1515, in getMetaParam
    return self._metadata[key]
  File "/usr/share/vdsm/storage/sp.py", line 1335, in _metadata
    return self._getPoolMD(self.masterDomain)
  File "/usr/share/vdsm/storage/sp.py", line 1331, in _getPoolMD
    return DictValidator(domain._metadata._dict, SP_MD_FIELDS)
  File "/usr/share/vdsm/storage/sdc.py", line 49, in __getattr__
    return getattr(self.getRealDomain(), attrName)
  File "/usr/share/vdsm/storage/sdc.py", line 52, in getRealDomain
    return self._cache._realProduce(self._sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 122, in _realProduce
    domain = self._findDomain(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 141, in _findDomain
    dom = findMethod(sdUUID)
  File "/usr/share/vdsm/storage/sdc.py", line 170, in _findUnfetchedDomain
 raise se.StorageDomainDoesNotExist(sdUUID)
StorageDomainDoesNotExist: Storage domain does not exist: ('7414f930-bbdb-4ec6-8132-4640cbb3c722',)
Thread-27696::ERROR::2013-06-17 13:07:52,686::task::850::TaskManager.Task::(_setError) Task=`22cefb06-d7c9-450d-95db-a02a0c22a15b`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 857, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 41, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 615, in getSpmStatus
    raise se.StorageDomainMasterError("MD read error")
StorageDomainMasterError: Error validating master storage domain: ('MD read error',)
Thread-27696::DEBUG::2013-06-17 13:07:52,686::task::869::TaskManager.Task::(_run) Task=`22cefb06-d7c9-450d-95db-a02a0c22a15b`::Task._run: 22cefb06-d7c9-450d-95db-a02a0c22a15b ('7fd33b43-a9f4-4eb7-a885-e9583a929ceb',) {} failed - stopping task


engine: 

2013-06-17 13:07:48,492 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-57) Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',)
2013-06-17 13:07:49,205 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-57) [b5088a8] Command SpmStatusVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Error validating master storage domain: ('MD read error',)
2013-06-17 13:07:52,418 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-53) [3e7b6b31] Command ListVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-06-17 13:07:55,431 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-69) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-06-17 13:08:00,723 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-74) SPM Init: could not find reported vds or not up - pool:iS

Comment 1 Ayal Baron 2013-07-10 08:50:34 UTC
After discussing with Haim, current behaviour is correct. closing