Bug 902408 - [RHEVM] Failed to Reconstruct Master Domain for Data Center
[RHEVM] Failed to Reconstruct Master Domain for Data Center
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine (Show other bugs)
3.2.0
x86_64 Linux
unspecified Severity medium
: ---
: 3.2.0
Assigned To: Liron Aravot
vvyazmin@redhat.com
storage
: Regression
Depends On:
Blocks: 880180
  Show dependency treegraph
 
Reported: 2013-01-21 10:43 EST by vvyazmin@redhat.com
Modified: 2016-02-10 11:49 EST (History)
10 users (show)

See Also:
Fixed In Version: SF4
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Storage
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
## Logs vdsm, rhevm (967.61 KB, application/x-gzip)
2013-01-21 10:43 EST, vvyazmin@redhat.com
no flags Details
## Logs screen-shots (398.66 KB, image/png)
2013-01-21 10:43 EST, vvyazmin@redhat.com
no flags Details

  None (edit)
Description vvyazmin@redhat.com 2013-01-21 10:43:04 EST
Created attachment 684375 [details]
## Logs vdsm, rhevm

Description of problem: 
Failed to Reconstruct Master Domain for Data Center


Version-Release number of selected component (if applicable):
RHEVM 3.2 - SF03 environment 

RHEVM: rhevm-3.2.0-4.el6ev.noarch
VDSM: vdsm-4.10.2-3.0.el6ev.x86_64
LIBVIRT: libvirt-0.10.2-13.el6.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64

How reproducible:
100%

Fail on iSCSI and FC environments
Build on scenario BZ880180

Steps to Reproduce:
1. Create (iSCSI or FC) DC environment with two hosts
 and two SD
2. Create a new SD that not connected to any DC (DC == none)
3. Enter DC to Maintenance (by Maintenance all SD in DC)
4. Perform Re-Initialize Data Center with a new SD (that created in step 2)
  
Actual results:
1. Failed Re-Initialize Data Center
2. Failed to Reconstruct Master Domain for Data Center 
3. DC stay in Maintenance status, new a SD became in Active status, and Domain type == Master

Workaround:
1. Enter a new SD in Maintenance
2. Activate a new SD
3. Re-Initialize Data Center & Reconstruct Master Domain succeeds

Expected results:
1. Succeed Re-Initialize Data Center
2. Succeed to Reconstruct Master Domain for Data Center

Additional info:

/var/log/ovirt-engine/engine.log

2013-01-21 18:27:31,577 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) START, ReconstructMasterVDSCommand(HostName = green-vdsa
, HostId = cb3fe264-b588-4e9b-9177-f38922ec3aec, vdsSpmId = 1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, storagePoolName = DC-FC-01, masterDomainId = 7cb486b6-2d76-4
769-8556-ec4a49bbaee5, masterVersion = 6, domainsList = [{ domainId: f03bed3f-bcc1-4eee-8f1b-229965569b8a, status: Maintenance };{ domainId: dcd81ca8-e425-4d67-86ca-e9b717734eba
, status: Maintenance };{ domainId: 6c194925-3a8c-4685-bd5d-79d9ca372a26, status: Maintenance };{ domainId: 7cb486b6-2d76-4769-8556-ec4a49bbaee5, status: Active };]), log id: 2b
3fd37e
2013-01-21 18:28:02,545 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) FINISH, ReconstructMasterVDSCommand, log id: 2b3fd37e
2013-01-21 18:28:02,552 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-3-thread-49) START, RefreshStoragePoolVDSCommand(HostName = green-vd
sb, HostId = d4186009-ad6b-453e-96a4-846b4fefb2d1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, masterStorageDomainId=7cb486b6-2d76-4769-8556-ec4a49bbaee5, msterVersion
=6), log id: 1e9ea35e
2013-01-21 18:28:02,554 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-3-thread-48) START, RefreshStoragePoolVDSCommand(HostName = green-vd
sa, HostId = cb3fe264-b588-4e9b-9177-f38922ec3aec, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, masterStorageDomainId=7cb486b6-2d76-4769-8556-ec4a49bbaee5, msterVersion
=6), log id: 513066ae
2013-01-21 18:28:02,604 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) Failed in RefreshStoragePoolVDS method
2013-01-21 18:28:02,604 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) Error code StoragePoolUnknown and error message VDSGenericExceptio
n: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',)
2013-01-21 18:28:02,605 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) Command org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePo
olVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         309
mMessage                      Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',)


2013-01-21 18:28:02,605 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-49) HostName = green-vdsb
2013-01-21 18:28:02,605 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-49) Command RefreshStoragePoolVDS execution failed. Exception: VDSErrorException: V
DSGenericException: VDSErrorException: Failed to RefreshStoragePoolVDS, error = Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',)
2013-01-21 18:28:02,605 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.RefreshStoragePoolVDSCommand] (pool-3-thread-49) FINISH, RefreshStoragePoolVDSCommand, log id: 1e9ea35e
2013-01-21 18:28:02,607 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-49) START, ConnectStoragePoolVDSCommand(HostName = green-vd
sb, HostId = d4186009-ad6b-453e-96a4-846b4fefb2d1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, vds_spm_id = 2, masterDomainId = 7cb486b6-2d76-4769-8556-ec4a49bbaee5, m
asterVersion = 6), log id: 271b5370

/var/log/vdsm/vdsm.log
Comment 1 vvyazmin@redhat.com 2013-01-21 10:43:39 EST
Created attachment 684376 [details]
## Logs screen-shots
Comment 2 vvyazmin@redhat.com 2013-01-21 11:07:16 EST
/var/log/vdsm/vdsm.log (Host: green-vdsa)


Thread-2605::DEBUG::2013-01-21 18:28:01,870::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state init -> state preparing
Thread-2605::INFO::2013-01-21 18:28:01,870::logUtils::37::dispatcher::(wrapper) Run and protect: refreshStoragePool(spUUID='d13b88c1-89d1-416d-b35a-f5f5f4bc3b33', msdUUID='7cb486b6-2d76-4769-8556-ec4a49bbaee5', masterVersion=6, options=None)
Thread-2605::DEBUG::2013-01-21 18:28:01,870::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33`ReqID=`58702cfd-0657-49be-b90f-cec35b8f97e2`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__'
Thread-2605::DEBUG::2013-01-21 18:28:01,870::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33' for lock type 'shared'
Thread-2605::DEBUG::2013-01-21 18:28:01,871::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33' is free. Now locking as 'shared' (1 active user)
Thread-2605::DEBUG::2013-01-21 18:28:01,871::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33`ReqID=`58702cfd-0657-49be-b90f-cec35b8f97e2`::Granted request
Thread-2605::DEBUG::2013-01-21 18:28:01,871::task::794::TaskManager.Task::(resourceAcquired) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::_resourcesAcquired: Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33 (shared)
Thread-2605::DEBUG::2013-01-21 18:28:01,871::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 1 aborting False
Thread-2605::ERROR::2013-01-21 18:28:01,872::task::833::TaskManager.Task::(_setError) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Unexpected error
Traceback (most recent call last):
  File "/usr/share/vdsm/storage/task.py", line 840, in _run
    return fn(*args, **kargs)
  File "/usr/share/vdsm/logUtils.py", line 38, in wrapper
    res = f(*args, **kwargs)
  File "/usr/share/vdsm/storage/hsm.py", line 788, in refreshStoragePool
    pool = self.getPool(spUUID)
  File "/usr/share/vdsm/storage/hsm.py", line 310, in getPool
    raise se.StoragePoolUnknown(spUUID)
StoragePoolUnknown: Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',)
Thread-2605::DEBUG::2013-01-21 18:28:01,872::task::852::TaskManager.Task::(_run) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Task._run: a7659b54-043e-41ea-890d-8c97040608fb ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33', '7cb486b6-2d76-4769-8556-ec4a49bbaee5', 6) {} failed - stopping task
Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::1177::TaskManager.Task::(stop) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::stopping in state preparing 
(force False)
Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 1 aborting True
Thread-2605::INFO::2013-01-21 18:28:01,873::task::1134::TaskManager.Task::(prepare) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::aborting: Task is aborted: 'Unknown pool id, pool not connected' - code 309
Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::1139::TaskManager.Task::(prepare) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Prepare: aborted: Unknown pool id, pool not connected
Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 0 aborting True
Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::892::TaskManager.Task::(_doAbort) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Task._doAbort: force False
Thread-2605::DEBUG::2013-01-21 18:28:01,874::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {}
Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state preparing -> state aborting
Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::523::TaskManager.Task::(__state_aborting) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::_aborting: recover policy none
Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state aborting -> state failed
Comment 4 Allon Mureinik 2013-01-23 04:51:31 EST
Not sure if a VDSM issue or if engine is just send wrong commands/params.

Liron, please investigate
Comment 5 Liron Aravot 2013-01-24 05:27:16 EST
vvyazmin, Reconstruct/Recovery succeeded - what failed is the RefreshStoragePoolCommand which is ok as later a ConnectStoragePool and succeeded so it should be fine.


2013-01-21 18:27:31,577 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) START, ReconstructMasterVDSCommand(HostName = green-vdsa, HostId = cb3fe264-b588-4e9b-9177-f38922ec3aec, vdsSpmId = 1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, storagePoolName = DC-FC-01, masterDomainId = 7cb486b6-2d76-4769-8556-ec4a49bbaee5, masterVersion = 6, domainsList = [{ domainId: f03bed3f-bcc1-4eee-8f1b-229965569b8a, status: Maintenance };{ domainId: dcd81ca8-e425-4d67-86ca-e9b717734eba, status: Maintenance };{ domainId: 6c194925-3a8c-4685-bd5d-79d9ca372a26, status: Maintenance };{ domainId: 7cb486b6-2d76-4769-8556-ec4a49bbaee5, status: Active };]), log id: 2b3fd37e
2013-01-21 18:28:02,545 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) FINISH, ReconstructMasterVDSCommand, log id: 2b3fd37e



further fixes to this flow were introduced here:
http://gerrit.ovirt.org/#/c/10549/
Comment 7 vvyazmin@redhat.com 2013-01-27 05:10:55 EST
Verified on RHEVM 3.2 - SF04 environment (iSCSI and FC):

RHEVM: rhevm-3.2.0-5.el6ev.noarch
VDSM: vdsm-4.10.2-4.0.el6ev.x86_64
LIBVIRT: libvirt-0.10.2-16.el6.x86_64
QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64
SANLOCK: sanlock-2.6-2.el6.x86_64

Succeed Re-Initialize Data Center, but get an other exception during Re-Initialize action. Open a new bug BZ904724
Comment 8 Itamar Heim 2013-06-11 04:52:38 EDT
3.2 has been released
Comment 9 Itamar Heim 2013-06-11 04:52:38 EDT
3.2 has been released
Comment 10 Itamar Heim 2013-06-11 04:52:38 EDT
3.2 has been released
Comment 11 Itamar Heim 2013-06-11 04:56:31 EDT
3.2 has been released
Comment 12 Itamar Heim 2013-06-11 05:26:29 EDT
3.2 has been released

Note You need to log in before you can comment on or make changes to this bug.