Bug 784038 - Can't connect host to storage pool after new master was selected due to storage troubles
Summary: Can't connect host to storage pool after new master was selected due to stora...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: oVirt
Classification: Retired
Component: vdsm
Version: unspecified
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: 3.3.4
Assignee: Dan Kenigsberg
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-23 16:16 UTC by Jakub Libosvar
Modified: 2016-02-10 16:40 UTC (History)
6 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2013-03-12 09:37:40 UTC
oVirt Team: Storage
Embargoed:


Attachments (Terms of Use)
engine, vdsm log (10.62 KB, application/x-gzip)
2012-01-23 16:16 UTC, Jakub Libosvar
no flags Details

Description Jakub Libosvar 2012-01-23 16:16:56 UTC
Created attachment 557007 [details]
engine, vdsm log

Description of problem:
I have two hosts in iscsi DC with two domains, each on different server. I put hsm to maintenance and on spm drop connection to master storage. After new master was selected and former master went to inactive I unblocked the network connection between spm and storage. Then I activated the storage domain that had network troubles. So now there is one host SPM active, one host in maintenance and two storages active. I activated the host in maintenance but it fails to connect to storage pool due to master versions mismatch:

On SPM:
[root@srh-03 ~]# vdsClient -s 0 getStorageDomainInfo cc1ff6aa-3320-4f18-a5e9-a68b6db70f23
	uuid = cc1ff6aa-3320-4f18-a5e9-a68b6db70f23
	vguuid = ZcADor-FO4W-e15W-3eXs-kIbJ-YlWi-kc3rN8
	lver = 0
	state = OK
	version = 2
	role = Master
	pool = ['29eedc70-5b61-4f20-8b64-ea4a1fb4b48c']
	spm_id = 2
	type = ISCSI
	class = Data
	master_ver = 13
	name = str01-jlibosva2

Backend: 
2012-01-23 17:09:18,966 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-31) START, ConnectStoragePoolVDSCommand(vdsId = 29291ea0-41ac-11e1-886e-001a4a013f0a, storagePoolId = 29eedc70-5b61-4f20-8b64-ea4a1fb4b48c, vds_spm_id = 1, masterDomainId = cc1ff6aa-3320-4f18-a5e9-a68b6db70f23, masterVersion = 13), log id: 1fc7b8bc

vdsm:
Thread-363::ERROR::2012-01-23 17:07:37,872::sp::1449::Storage.StoragePool::(getMasterDomain) Requested master domain cc1ff6aa-3320-4f18-a5e9-a68b6db70f23 does not have expected version 13 it is version 10
Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::535::ResourceManager::(releaseResource) Trying to release resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c'
Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::550::ResourceManager::(releaseResource) Released resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' (0 active users)
Thread-363::DEBUG::2012-01-23 17:07:37,873::resourceManager::555::ResourceManager::(releaseResource) Resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c' is free, finding out if anyone is waiting for it.
Thread-363::DEBUG::2012-01-23 17:07:37,874::resourceManager::562::ResourceManager::(releaseResource) No one is waiting for resource 'Storage.29eedc70-5b61-4f20-8b64-ea4a1fb4b48c', Clearing records.
Thread-363::ERROR::2012-01-23 17:07:37,874::task::855::TaskManager.Task::(_setError) Task=`8771aa62-2bc6-4f8a-be4a-55dfd808c5bd`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 863, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 721, in connectStoragePool
    return self._connectStoragePool(spUUID, hostID, scsiKey, msdUUID, masterVersion, options)
  File "/usr/share/vdsm/storage/hsm.py", line 763, in _connectStoragePool
    res = pool.connect(hostID, scsiKey, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 624, in connect
    self.__rebuild(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1097, in __rebuild
    self.masterDomain = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1450, in getMasterDomain
    raise se.StoragePoolWrongMaster(self.spUUID, msdUUID)
StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=cc1ff6aa-3320-4f18-a5e9-a68b6db70f23, pool=29eedc70-5b61-4f20-8b64-ea4a1fb4b48c'



Version-Release number of selected component (if applicable):
vdsm-4.9.3.1-0.fc16.x86_64

How reproducible:
Always

Steps to Reproduce:
Please see description
  
Actual results:
Host goes to Non Operational

Expected results:
Host is connected and operational

Additional info:
Backend and vdsm log attached

Comment 1 Jakub Libosvar 2012-01-23 16:17:41 UTC
vdsm restart fixes the problem

Comment 2 Itamar Heim 2013-03-12 09:37:40 UTC
Closing old bugs. If this issue is still relevant/important in current version, please re-open the bug.


Note You need to log in before you can comment on or make changes to this bug.