Created attachment 684375 [details] ## Logs vdsm, rhevm Description of problem: Failed to Reconstruct Master Domain for Data Center Version-Release number of selected component (if applicable): RHEVM 3.2 - SF03 environment RHEVM: rhevm-3.2.0-4.el6ev.noarch VDSM: vdsm-4.10.2-3.0.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-13.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 How reproducible: 100% Fail on iSCSI and FC environments Build on scenario BZ880180 Steps to Reproduce: 1. Create (iSCSI or FC) DC environment with two hosts and two SD 2. Create a new SD that not connected to any DC (DC == none) 3. Enter DC to Maintenance (by Maintenance all SD in DC) 4. Perform Re-Initialize Data Center with a new SD (that created in step 2) Actual results: 1. Failed Re-Initialize Data Center 2. Failed to Reconstruct Master Domain for Data Center 3. DC stay in Maintenance status, new a SD became in Active status, and Domain type == Master Workaround: 1. Enter a new SD in Maintenance 2. Activate a new SD 3. Re-Initialize Data Center & Reconstruct Master Domain succeeds Expected results: 1. Succeed Re-Initialize Data Center 2. Succeed to Reconstruct Master Domain for Data Center Additional info: /var/log/ovirt-engine/engine.log 2013-01-21 18:27:31,577 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) START, ReconstructMasterVDSCommand(HostName = green-vdsa , HostId = cb3fe264-b588-4e9b-9177-f38922ec3aec, vdsSpmId = 1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, storagePoolName = DC-FC-01, masterDomainId = 7cb486b6-2d76-4 769-8556-ec4a49bbaee5, masterVersion = 6, domainsList = [{ domainId: f03bed3f-bcc1-4eee-8f1b-229965569b8a, status: Maintenance };{ domainId: dcd81ca8-e425-4d67-86ca-e9b717734eba , status: Maintenance };{ domainId: 6c194925-3a8c-4685-bd5d-79d9ca372a26, status: Maintenance };{ domainId: 7cb486b6-2d76-4769-8556-ec4a49bbaee5, status: Active };]), log id: 2b 3fd37e 2013-01-21 18:28:02,545 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) FINISH, ReconstructMasterVDSCommand, log id: 2b3fd37e 2013-01-21 18:28:02,552 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-3-thread-49) START, RefreshStoragePoolVDSCommand(HostName = green-vd sb, HostId = d4186009-ad6b-453e-96a4-846b4fefb2d1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, masterStorageDomainId=7cb486b6-2d76-4769-8556-ec4a49bbaee5, msterVersion =6), log id: 1e9ea35e 2013-01-21 18:28:02,554 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-3-thread-48) START, RefreshStoragePoolVDSCommand(HostName = green-vd sa, HostId = cb3fe264-b588-4e9b-9177-f38922ec3aec, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, masterStorageDomainId=7cb486b6-2d76-4769-8556-ec4a49bbaee5, msterVersion =6), log id: 513066ae 2013-01-21 18:28:02,604 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) Failed in RefreshStoragePoolVDS method 2013-01-21 18:28:02,604 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) Error code StoragePoolUnknown and error message VDSGenericExceptio n: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',) 2013-01-21 18:28:02,605 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) Command org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePo olVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 309 mMessage Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',) 2013-01-21 18:28:02,605 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) HostName = green-vdsb 2013-01-21 18:28:02,605 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command RefreshStoragePoolVDS execution failed. Exception: VDSErrorException: V DSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',) 2013-01-21 18:28:02,605 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-3-thread-49) FINISH, RefreshStoragePoolVDSCommand, log id: 1e9ea35e 2013-01-21 18:28:02,607 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-49) START, ConnectStoragePoolVDSCommand(HostName = green-vd sb, HostId = d4186009-ad6b-453e-96a4-846b4fefb2d1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, vds_spm_id = 2, masterDomainId = 7cb486b6-2d76-4769-8556-ec4a49bbaee5, m asterVersion = 6), log id: 271b5370 /var/log/vdsm/vdsm.log
Created attachment 684376 [details] ## Logs screen-shots
/var/log/vdsm/vdsm.log (Host: green-vdsa) Thread-2605::DEBUG::2013-01-21 18:28:01,870::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state init -> state preparing Thread-2605::INFO::2013-01-21 18:28:01,870::logUtils::37::dispatcher::(wrapper) Run and protect: refreshStoragePool(spUUID='d13b88c1-89d1-416d-b35a-f5f5f4bc3b33', msdUUID='7cb486b6-2d76-4769-8556-ec4a49bbaee5', masterVersion=6, options=None) Thread-2605::DEBUG::2013-01-21 18:28:01,870::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33`ReqID=`58702cfd-0657-49be-b90f-cec35b8f97e2`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__' Thread-2605::DEBUG::2013-01-21 18:28:01,870::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33' for lock type 'shared' Thread-2605::DEBUG::2013-01-21 18:28:01,871::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33' is free. Now locking as 'shared' (1 active user) Thread-2605::DEBUG::2013-01-21 18:28:01,871::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33`ReqID=`58702cfd-0657-49be-b90f-cec35b8f97e2`::Granted request Thread-2605::DEBUG::2013-01-21 18:28:01,871::task::794::TaskManager.Task::(resourceAcquired) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::_resourcesAcquired: Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33 (shared) Thread-2605::DEBUG::2013-01-21 18:28:01,871::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 1 aborting False Thread-2605::ERROR::2013-01-21 18:28:01,872::task::833::TaskManager.Task::(_setError) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 788, in refreshStoragePool pool = self.getPool(spUUID) File "/usr/share/vdsm/storage/hsm.py", line 310, in getPool raise se.StoragePoolUnknown(spUUID) StoragePoolUnknown: Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',) Thread-2605::DEBUG::2013-01-21 18:28:01,872::task::852::TaskManager.Task::(_run) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Task._run: a7659b54-043e-41ea-890d-8c97040608fb ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33', '7cb486b6-2d76-4769-8556-ec4a49bbaee5', 6) {} failed - stopping task Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::1177::TaskManager.Task::(stop) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::stopping in state preparing (force False) Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 1 aborting True Thread-2605::INFO::2013-01-21 18:28:01,873::task::1134::TaskManager.Task::(prepare) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::aborting: Task is aborted: 'Unknown pool id, pool not connected' - code 309 Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::1139::TaskManager.Task::(prepare) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Prepare: aborted: Unknown pool id, pool not connected Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 0 aborting True Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::892::TaskManager.Task::(_doAbort) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Task._doAbort: force False Thread-2605::DEBUG::2013-01-21 18:28:01,874::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state preparing -> state aborting Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::523::TaskManager.Task::(__state_aborting) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::_aborting: recover policy none Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state aborting -> state failed
Not sure if a VDSM issue or if engine is just send wrong commands/params. Liron, please investigate
vvyazmin, Reconstruct/Recovery succeeded - what failed is the RefreshStoragePoolCommand which is ok as later a ConnectStoragePool and succeeded so it should be fine. 2013-01-21 18:27:31,577 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) START, ReconstructMasterVDSCommand(HostName = green-vdsa, HostId = cb3fe264-b588-4e9b-9177-f38922ec3aec, vdsSpmId = 1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, storagePoolName = DC-FC-01, masterDomainId = 7cb486b6-2d76-4769-8556-ec4a49bbaee5, masterVersion = 6, domainsList = [{ domainId: f03bed3f-bcc1-4eee-8f1b-229965569b8a, status: Maintenance };{ domainId: dcd81ca8-e425-4d67-86ca-e9b717734eba, status: Maintenance };{ domainId: 6c194925-3a8c-4685-bd5d-79d9ca372a26, status: Maintenance };{ domainId: 7cb486b6-2d76-4769-8556-ec4a49bbaee5, status: Active };]), log id: 2b3fd37e 2013-01-21 18:28:02,545 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) FINISH, ReconstructMasterVDSCommand, log id: 2b3fd37e further fixes to this flow were introduced here: http://gerrit.ovirt.org/#/c/10549/
Verified on RHEVM 3.2 - SF04 environment (iSCSI and FC): RHEVM: rhevm-3.2.0-5.el6ev.noarch VDSM: vdsm-4.10.2-4.0.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-16.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 Succeed Re-Initialize Data Center, but get an other exception during Re-Initialize action. Open a new bug BZ904724
3.2 has been released