Bug 784038

Summary: Can't connect host to storage pool after new master was selected due to storage troubles
Product: [Retired] oVirt Reporter: Jakub Libosvar <jlibosva>
Component: vdsmAssignee: Dan Kenigsberg <danken>
Status: CLOSED WONTFIX QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: unspecifiedCC: abaron, acathrow, amureini, bazulay, iheim, ykaul
Target Milestone: ---   
Target Release: 3.3.4   
Hardware: x86_64   
OS: Linux   
Whiteboard: storage
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-03-12 09:37:40 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
engine, vdsm log none

Description Jakub Libosvar 2012-01-23 16:16:56 UTC
Created attachment 557007 [details]
engine, vdsm log

Description of problem:
I have two hosts in iscsi DC with two domains, each on different server. I put hsm to maintenance and on spm drop connection to master storage. After new master was selected and former master went to inactive I unblocked the network connection between spm and storage. Then I activated the storage domain that had network troubles. So now there is one host SPM active, one host in maintenance and two storages active. I activated the host in maintenance but it fails to connect to storage pool due to master versions mismatch:

On SPM:
[root@srh-03 ~]# vdsClient -s 0 getStorageDomainInfo cc1ff6aa-3320-4f18-a5e9-a68b6db70f23
	uuid = cc1ff6aa-3320-4f18-a5e9-a68b6db70f23
	vguuid = ZcADor-FO4W-e15W-3eXs-kIbJ-YlWi-kc3rN8
	lver = 0
	state = OK
	version = 2
	role = Master
	pool = ['29eedc70-5b61-4f20-8b64-ea4a1fb4b48c']
	spm_id = 2
	type = ISCSI
	class = Data
	master_ver = 13
	name = str01-jlibosva2

Backend: 
2012-01-23 17:09:18,966 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-31) START, ConnectStoragePoolVDSCommand(vdsId = 29291ea0-41ac-11e1-886e-001a4a013f0a, storagePoolId = 29eedc70-5b61-4f20-8b64-ea4a1fb4b48c, vds_spm_id = 1, masterDomainId = cc1ff6aa-3320-4f18-a5e9-a68b6db70f23, masterVersion = 13), log id: 1fc7b8bc

vdsm:
Thread-363::ERROR::2012-01-23 17:07:37,872::sp::1449::Storage.StoragePool::(getMasterDomain) Requested master domain cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 does not have expected version 13 it is version 10
Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::535::ResourceManager::(releaseResource) Trying to release resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c'
Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::550::ResourceManager::(releaseResource) Released resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' (0 active users)
Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::555::ResourceManager::(releaseResource) Resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' is free, finding out if anyone is waiting for it.
Thread-363::DEBUG::2012-01-23 17:07:37,874::resourceManager::562::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c', Clearing records.
Thread-363::ERROR::2012-01-23 17:07:37,874::task::855::TaskManager.Task::(_setError) Task=`8771aa62-2bc6-4f8a-be4a-55dfd808c5bd`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 863, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 721, in connectStoragePool
    return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options)
  File "/usr/share/vdsm/storage/hsm.py", line 763, in _connectStoragePool
    res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 624, in connect
    self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1097, in __rebuild
    self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1450, in getMasterDomain
    raise se.StoragePoolWrongMaster(self.spUUID, msdUUID)
StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=cc1ff6aa-3320-4f18-a5e9-a68b6db70f23, pool=29eedc70-5b61-4f20-8b64-ea4a1fb4b48c'



Version-Release number of selected component (if applicable):
vdsm-4.9.3.1-0.fc16.x86_64

How reproducible:
Always

Steps to Reproduce:
Please see description
  
Actual results:
Host goes to Non Operational

Expected results:
Host is connected and operational

Additional info:
Backend and vdsm log attached

Comment 1 Jakub Libosvar 2012-01-23 16:17:41 UTC
vdsm restart fixes the problem

Comment 2 Itamar Heim 2013-03-12 09:37:40 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.