Bug 1397189

Summary: [HE] hosted_engine storage domain fail to take master domain role
Product: [oVirt] ovirt-engine Reporter: Raz Tamir <ratamir>
Component: BLL.HostedEngineAssignee: Doron Fediuck <dfediuck>
Status: CLOSED NOTABUG QA Contact: meital avital <mavital>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.0.6CC: alukiano, bugs, dfediuck, mkalinin, nsednev
Target Milestone: ---Flags: ratamir: planning_ack?
ratamir: devel_ack?
ratamir: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-11-22 07:31:57 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1400127    
Attachments:
Description Flags
engine and vdsm logs none

Description Raz Tamir 2016-11-21 20:33:47 UTC
Created attachment 1222475 [details]
engine and vdsm logs

Description of problem:
In environment with 1 data domain, master, and 1 hosted_storage storage domain, when deactivating the master domain, the hosted_storage should take the master domain role and it fails.
At first, ovirt-engine service is restarted and after the service is running again, a ReconstructMasterDomain action is also failed (engine.log):
2016-11-21 14:45:40,382 ERROR [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (DefaultQuartzScheduler10) [] IrsBroker::Failed::GetStoragePoolInfoVDS: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Wrong Master domain or its version: u'SD=12f05e4f-382b-4319-bcca-b88703bb79ca, pool=c1390294-e3f4-45ba-84cc-a03a2ef561ff'
2016-11-21 14:45:40,535 WARN  [org.ovirt.engine.core.bll.storage.pool.ReconstructMasterDomainCommand] (org.ovirt.thread.pool-6-thread-50) [e6c81c5] Validation of action 'ReconstructMasterDomain' failed for user SYSTEM. Reasons: VAR__ACTION__RECONSTRUCT_MASTER,VAR__TYPE__STORAGE__DOMAIN,ACTION_TYPE_FAILED_STORAGE_DOMAIN_STATUS_ILLEGAL2,$status PreparingForMaintenance

In vdsm.log:
jsonrpc.Executor/7::ERROR::2016-11-21 21:44:49,659::task::868::Storage.TaskManager.Task::(_setError) Task=`39bba449-8ef3-405c-b3ac-5a7db84ff3e3`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 875, in _run
    return fn(*args, **kargs)
  File "/usr/lib/python2.7/site-packages/vdsm/logUtils.py", line 50, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 988, in connectStoragePool
    spUUID, hostID, msdUUID, masterVersion, domainsMap)
  File "/usr/share/vdsm/storage/hsm.py", line 1053, in _connectStoragePool
    res = pool.connect(hostID, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 646, in connect
    self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1237, in __rebuild
    raise se.StoragePoolWrongMaster(self.spUUID, msdUUID)
StoragePoolWrongMaster: Wrong Master domain or its version: u'SD=12f05e4f-382b-4319-bcca-b88703bb79ca, pool=c1390294-e3f4-45ba-84cc-a03a2ef561ff'
jsonrpc.Executor/7::DEBUG::2016-11-21 21:44:49,659::task::887::Storage.TaskManager.Task::(_run) Task=`39bba449-8ef3-405c-b3ac-5a7db84ff3e3`::Task._run: 39bba449-8ef3-405c-b3ac-5a7db84ff3e3 (u'c1390294-e3f4-45ba-84cc-a03a2ef561ff', 4, u'12f05e4f-382b-4319-bcca-b88703bb79ca', 1, {u'9e1ce814-e23f-427b-ab41-b675ccd15e28': u'attached', u'129310da-7b83-4067-b35a-f377e6468310': u'attached', u'd24e3477-6799-48e2-aa8e-bf4776ec8463': u'attached', u'01bc79af-a55b-48e4-b451-cc7ff59ae8e6': u'active', u'7376201f-0c83-4fe7-a2ca-24893dd1de8c': u'attached', u'40d08016-cb96-4771-bd65-3910157ecefa': u'attached', u'6d03d0e7-4758-46c2-9cef-80f156851710': u'active', u'2a82ecbd-e7bd-473f-a713-0143ec06170a': u'attached', u'12f05e4f-382b-4319-bcca-b88703bb79ca': u'attached', u'b44a3e15-2ee0-4330-9d45-380109bffd54': u'attached', u'c9311021-8765-4058-b0c6-c02228574117': u'attached'}) {} failed - stopping task



Version-Release number of selected component (if applicable):


How reproducible:
100%

Steps to Reproduce:
Setup: 1 data SD (master), 1 hosted_engine SD
1. Deactivate the master storage domain
2.
3.

Actual results:
ovirt-engine service is restarted and 5 minutes later, after the environment is accessible again, a ReconstructMasterDomain action fails too


Expected results:
hosted_engine storage domain should become master domain

Additional info:

Comment 1 Doron Fediuck 2016-11-22 07:31:57 UTC
Hosted storage domain cannot become the master domain since it's being controlled externally (connect was done by the ha agent via vdsm). 
This is by design since we need to decide who's in control- the engine or the ha-agent and the current implementation keeps the agent in control.