Bug 975846 - Can't bring hosts up after power outage.
Summary: Can't bring hosts up after power outage.
Keywords:
Status: CLOSED DUPLICATE of bug 969640
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.2.0
Hardware: Unspecified
OS: Unspecified
urgent
unspecified
Target Milestone: ---
: 3.3.0
Assignee: Liron Aravot
QA Contact:
URL:
Whiteboard: storage
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-19 12:55 UTC by Ohad Basan
Modified: 2016-02-10 17:56 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-25 07:37:05 UTC
oVirt Team: Storage
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Ohad Basan 2013-06-19 12:55:38 UTC
Description of problem:
After power outage we have reached a state that storage pool was not responsive and hosts were down.
all storage domains were in maintenance
when trying to activate the host connectstoragepool failed and host moved to non operational.

Comment 3 Liron Aravot 2013-06-23 12:40:26 UTC
Ohad,
the first occurence of the issue in the logs is already after the bug occurred
can you please provide earlier logs or please provide steps to reproduce? it seems like it's not related to the outage..seems like the issue was present already before - pool 430abeff first occurrence in the logs is - 

this is the first occurrence of the given pool id starting with 430abeff.

2013-06-19 10:53:39,995 INFO  [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-23) [2693e676] Running command: InitVdsOnUpCommand internal: true.
2013-06-19 10:53:40,059 INFO  [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [2693e676] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected :  ID: 430abeff-bfeb-49c6-9638-7b02b6e71223 Type: StoragePool
2013-06-19 10:53:40,155 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [2693e676] START, ConnectStorageServerVDSCommand(HostName = white-vdsc.ci.lab.tlv.redhat.com, HostId = 39e7a216-75b8-11e2-af2b-00145e8327d8, storagePoolId = 430abeff-bfeb-49c6-9638-7b02b6e71223, storageType = NFS, connectionList = [{ id: fd8ce5ae-562c-42d9-9487-052e35f5cd9c, connection: shual.eng.lab.tlv.redhat.com:/volumes/shual/integration/rhevm-31-integ-iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 3bb28a28
2013-06-19 10:53:42,947 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-29) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-06-19 10:53:43,240 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [2693e676] FINISH, ConnectStorageServerVDSCommand, return: {fd8ce5ae-562c-42d9-9487-052e35f5cd9c=0}, log id: 3bb28a28
2013-06-19 10:53:43,240 INFO  [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [2693e676] Host white-vdsc.ci.lab.tlv.redhat.com storage connection was succeeded 
2013-06-19 10:53:43,274 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-25) START, ConnectStoragePoolVDSCommand(HostName = white-vdsc.ci.lab.tlv.redhat.com, HostId = 39e7a216-75b8-11e2-af2b-00145e8327d8, storagePoolId = 430abeff-bfeb-49c6-9638-7b02b6e71223, vds_spm_id = 1, masterDomainId = 71baec3a-a456-49a9-99fd-4b297725b08d, masterVersion = 6603), log id: 3859d78b
2013-06-19 10:53:46,178 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-32) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused
2013-06-19 10:53:46,703 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-25) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value 
 StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=304, mMessage=Cannot find master domain: 'spUUID=430abeff-bfeb-49c6-9638-7b02b6e71223, msdUUID=71baec3a-a456-49a9-99fd-4b297725b08d']]
2013-06-19 10:53:46,704 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-25) HostName = white-vdsc.ci.lab.tlv.redhat.com
2013-06-19 10:53:46,766 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-25) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=430abeff-bfeb-49c6-9638-7b02b6e71223, msdUUID=71baec3a-a456-49a9-99fd-4b297725b08d'
2013-06-19 10:53:46,781 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-25) FINISH, ConnectStoragePoolVDSCommand, log id: 3859d78b
2013-06-19 10:53:46,819 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-3-thread-25) Could not connect host white-vdsc.ci.lab.tlv.redhat.com to pool Integration-Stable-FC
2013-06-19 10:53:46,894 INFO  [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-23) [2693e676] Running command: SetNonOperationalVdsCommand internal: true. Entities affected :  ID: 39e7a216-75b8-11e2-af2b-00145e8327d8 Type: VDS

Comment 6 Liron Aravot 2013-06-25 05:59:21 UTC
It appears that the domain status were been maniuplated manually (db manual update) after it was in LOCKED status in order to resolve an issue in other pool .therefore this issue isn't a bug as the situation has occurred by manual manipulation of the DB, probably in order to fix bug 975742 - as the query was general, the domain statuses in that pool were changed as well.

the issue of the domain being locked is a duplicate of
https://bugzilla.redhat.com/show_bug.cgi?id=969640

therefore it seems like this one can be closed.

Comment 7 Allon Mureinik 2013-06-25 07:37:05 UTC
Agreed, closing this one.

*** This bug has been marked as a duplicate of bug 969640 ***


Note You need to log in before you can comment on or make changes to this bug.