Bug 902408
Summary: | [RHEVM] Failed to Reconstruct Master Domain for Data Center | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | vvyazmin <vvyazmin> | ||||||
Component: | ovirt-engine | Assignee: | Liron Aravot <laravot> | ||||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | vvyazmin <vvyazmin> | ||||||
Severity: | medium | Docs Contact: | |||||||
Priority: | unspecified | ||||||||
Version: | 3.2.0 | CC: | acathrow, amureini, dyasny, hateya, iheim, lpeer, Rhev-m-bugs, sgrinber, yeylon, ykaul | ||||||
Target Milestone: | --- | Keywords: | Regression | ||||||
Target Release: | 3.2.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | storage | ||||||||
Fixed In Version: | SF4 | Doc Type: | Bug Fix | ||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | Type: | Bug | |||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 880180 | ||||||||
Attachments: |
|
Description
vvyazmin@redhat.com
2013-01-21 15:43:04 UTC
Created attachment 684376 [details]
## Logs screen-shots
/var/log/vdsm/vdsm.log (Host: green-vdsa) Thread-2605::DEBUG::2013-01-21 18:28:01,870::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state init -> state preparing Thread-2605::INFO::2013-01-21 18:28:01,870::logUtils::37::dispatcher::(wrapper) Run and protect: refreshStoragePool(spUUID='d13b88c1-89d1-416d-b35a-f5f5f4bc3b33', msdUUID='7cb486b6-2d76-4769-8556-ec4a49bbaee5', masterVersion=6, options=None) Thread-2605::DEBUG::2013-01-21 18:28:01,870::resourceManager::175::ResourceManager.Request::(__init__) ResName=`Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33`ReqID=`58702cfd-0657-49be-b90f-cec35b8f97e2`::Request was made in '/usr/share/vdsm/storage/resourceManager.py' line '174' at '__init__' Thread-2605::DEBUG::2013-01-21 18:28:01,870::resourceManager::486::ResourceManager::(registerResource) Trying to register resource 'Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33' for lock type 'shared' Thread-2605::DEBUG::2013-01-21 18:28:01,871::resourceManager::528::ResourceManager::(registerResource) Resource 'Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33' is free. Now locking as 'shared' (1 active user) Thread-2605::DEBUG::2013-01-21 18:28:01,871::resourceManager::212::ResourceManager.Request::(grant) ResName=`Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33`ReqID=`58702cfd-0657-49be-b90f-cec35b8f97e2`::Granted request Thread-2605::DEBUG::2013-01-21 18:28:01,871::task::794::TaskManager.Task::(resourceAcquired) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::_resourcesAcquired: Storage.d13b88c1-89d1-416d-b35a-f5f5f4bc3b33 (shared) Thread-2605::DEBUG::2013-01-21 18:28:01,871::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 1 aborting False Thread-2605::ERROR::2013-01-21 18:28:01,872::task::833::TaskManager.Task::(_setError) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Unexpected error Traceback (most recent call last): File "/usr/share/vdsm/storage/task.py", line 840, in _run return fn(*args, **kargs) File "/usr/share/vdsm/logUtils.py", line 38, in wrapper res = f(*args, **kwargs) File "/usr/share/vdsm/storage/hsm.py", line 788, in refreshStoragePool pool = self.getPool(spUUID) File "/usr/share/vdsm/storage/hsm.py", line 310, in getPool raise se.StoragePoolUnknown(spUUID) StoragePoolUnknown: Unknown pool id, pool not connected: ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33',) Thread-2605::DEBUG::2013-01-21 18:28:01,872::task::852::TaskManager.Task::(_run) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Task._run: a7659b54-043e-41ea-890d-8c97040608fb ('d13b88c1-89d1-416d-b35a-f5f5f4bc3b33', '7cb486b6-2d76-4769-8556-ec4a49bbaee5', 6) {} failed - stopping task Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::1177::TaskManager.Task::(stop) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::stopping in state preparing (force False) Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 1 aborting True Thread-2605::INFO::2013-01-21 18:28:01,873::task::1134::TaskManager.Task::(prepare) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::aborting: Task is aborted: 'Unknown pool id, pool not connected' - code 309 Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::1139::TaskManager.Task::(prepare) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Prepare: aborted: Unknown pool id, pool not connected Thread-2605::DEBUG::2013-01-21 18:28:01,873::task::957::TaskManager.Task::(_decref) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::ref 0 aborting True Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::892::TaskManager.Task::(_doAbort) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::Task._doAbort: force False Thread-2605::DEBUG::2013-01-21 18:28:01,874::resourceManager::844::ResourceManager.Owner::(cancelAll) Owner.cancelAll requests {} Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state preparing -> state aborting Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::523::TaskManager.Task::(__state_aborting) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::_aborting: recover policy none Thread-2605::DEBUG::2013-01-21 18:28:01,874::task::568::TaskManager.Task::(_updateState) Task=`a7659b54-043e-41ea-890d-8c97040608fb`::moving from state aborting -> state failed Not sure if a VDSM issue or if engine is just send wrong commands/params. Liron, please investigate vvyazmin, Reconstruct/Recovery succeeded - what failed is the RefreshStoragePoolCommand which is ok as later a ConnectStoragePool and succeeded so it should be fine. 2013-01-21 18:27:31,577 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) START, ReconstructMasterVDSCommand(HostName = green-vdsa, HostId = cb3fe264-b588-4e9b-9177-f38922ec3aec, vdsSpmId = 1, storagePoolId = d13b88c1-89d1-416d-b35a-f5f5f4bc3b33, storagePoolName = DC-FC-01, masterDomainId = 7cb486b6-2d76-4769-8556-ec4a49bbaee5, masterVersion = 6, domainsList = [{ domainId: f03bed3f-bcc1-4eee-8f1b-229965569b8a, status: Maintenance };{ domainId: dcd81ca8-e425-4d67-86ca-e9b717734eba, status: Maintenance };{ domainId: 6c194925-3a8c-4685-bd5d-79d9ca372a26, status: Maintenance };{ domainId: 7cb486b6-2d76-4769-8556-ec4a49bbaee5, status: Active };]), log id: 2b3fd37e 2013-01-21 18:28:02,545 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ReconstructMasterVDSCommand] (pool-3-thread-44) FINISH, ReconstructMasterVDSCommand, log id: 2b3fd37e further fixes to this flow were introduced here: http://gerrit.ovirt.org/#/c/10549/ Verified on RHEVM 3.2 - SF04 environment (iSCSI and FC): RHEVM: rhevm-3.2.0-5.el6ev.noarch VDSM: vdsm-4.10.2-4.0.el6ev.x86_64 LIBVIRT: libvirt-0.10.2-16.el6.x86_64 QEMU & KVM: qemu-kvm-rhev-0.12.1.2-2.348.el6.x86_64 SANLOCK: sanlock-2.6-2.el6.x86_64 Succeed Re-Initialize Data Center, but get an other exception during Re-Initialize action. Open a new bug BZ904724 3.2 has been released 3.2 has been released 3.2 has been released 3.2 has been released 3.2 has been released |