Bug 670432

Summary: [vdsm] [storage] migrate master scenario - vdsm use old values in case restart takes places during operation
Product: Red Hat Enterprise Linux 6 Reporter: Haim <hateya>
Component: vdsmAssignee: Saggi Mizrahi <smizrahi>
Status: CLOSED ERRATA QA Contact: Haim <hateya>
Severity: high Docs Contact:
Priority: low    
Version: 6.1CC: abaron, bazulay, danken, dnaori, ewarszaw, iheim, mgoldboi, smizrahi, yeylon
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: vdsm-4.9-61 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2011-12-06 07:04:20 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
vdsm log. none

Description Haim 2011-01-18 09:47:02 UTC
Description of problem:

in deactivateStorageDomain on migrate master scenario, vdsm takes old values of master version, and thus, fails to connect storage pool (pool not connected). 

Thread-2904::INFO::2011-01-17 15:15:44,269::dispatcher::95::irs::Run and protect: deactivateStorageDomain, args: ( sdUUID=ae0b976c-83b0-458c-be2a-265637529d78 spUUID=04422aa0-39e6-475c-adac-ffb2ddf1e40c msdUUID=29b93fd7-1a68-406e-bfcf-3e85828575b7 masterVersion=2)


MainThread::ERROR::2011-01-17 16:15:17,891::misc::65::irs::Wrong Master domain or its version: 'SD=ae0b976c-83b0-458c-be2a-265637529d78, pool=04422aa0-39e6-475c-adac-ffb2ddf1e40c'
MainThread::ERROR::2011-01-17 16:15:17,892::misc::66::irs::Traceback (most recent call last):
  File "/usr/share/vdsm/storage/hsm.py", line 223, in __init__
    self._restorePool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 417, in _restorePool
    pool.reconnect()
  File "/usr/share/vdsm/storage/sp.py", line 514, in reconnect
    return self.connect(hostId, scsiKey, msdUUID, masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 411, in connect
    mDom = self.getMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1223, in getMasterDomain
    self.masterDomain = self.findMasterDomain(msdUUID=msdUUID, masterVersion=masterVersion)
  File "/usr/share/vdsm/storage/sp.py", line 1293, in findMasterDomain
    raise e
StoragePoolWrongMaster: Wrong Master domain or its version: 'SD=ae0b976c-83b0-458c-be2a-265637529d78, pool=04422aa0-39e6-475c-adac-ffb2ddf1e40c'

MainThread::INFO::2011-01-17 16:15:17,910::dispatcher::139::irs::Starting StorageDispatcher...
Thread-17::INFO::2011-01-17 16:15:18,411::dispatcher::95::irs::Run and protect: getSpmStatus, args: ( spUUID=04422aa0-39e6-475c-adac-ffb2ddf1e40c)
Thread-17::DEBUG::2011-01-17 16:15:18,411::task::577::irs::Task 459b58b1-297d-493a-9963-eb170ad729bb: moving from state init -> state preparing
Thread-17::ERROR::2011-01-17 16:15:18,412::misc::65::irs::Unknown pool id, pool not connected: ('04422aa0-39e6-475c-adac-ffb2ddf1e40c',)
Thread-17::ERROR::2011-01-17 16:15:18,414::misc::66::irs::Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 978, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/storage/spm.py", line 578, in public_getSpmStatus
    hsm.HSM.validateConnectedPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 86, in validateConnectedPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('04422aa0-39e6-475c-adac-ffb2ddf1e40c',)

backend then send connectStorageServer and connectStoragePool again, and host is connected to the pool. 

repro steps: 

1) work with several storage domains 
2) put master domain in maintenance

notes: 

1) Ayal reviewed this bug on rhel 5.5.6 and asked to open a bug on rhel6 (2.3)
2) see attached log

Comment 1 Haim 2011-01-18 09:49:41 UTC
note: restart means restart of vdsm service.

Comment 3 Haim 2011-01-18 12:06:09 UTC
Created attachment 474036 [details]
vdsm log.

Comment 5 Saggi Mizrahi 2011-04-07 16:44:56 UTC
Patches in gerrit:
http://gerrit.usersys.redhat.com/247

Comment 6 Haim 2011-05-01 16:16:28 UTC
verified on vdsm-4.9-62, migrated master domain several times, restarted service, and operation passed as expected.

Comment 7 errata-xmlrpc 2011-12-06 07:04:20 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHEA-2011-1782.html