Description of problem: After power outage we have reached a state that storage pool was not responsive and hosts were down. all storage domains were in maintenance when trying to activate the host connectstoragepool failed and host moved to non operational.
Ohad, the first occurence of the issue in the logs is already after the bug occurred can you please provide earlier logs or please provide steps to reproduce? it seems like it's not related to the outage..seems like the issue was present already before - pool 430abeff first occurrence in the logs is - this is the first occurrence of the given pool id starting with 430abeff. 2013-06-19 10:53:39,995 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-23) [2693e676] Running command: InitVdsOnUpCommand internal: true. 2013-06-19 10:53:40,059 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [2693e676] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID: 430abeff-bfeb-49c6-9638-7b02b6e71223 Type: StoragePool 2013-06-19 10:53:40,155 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [2693e676] START, ConnectStorageServerVDSCommand(HostName = white-vdsc.ci.lab.tlv.redhat.com, HostId = 39e7a216-75b8-11e2-af2b-00145e8327d8, storagePoolId = 430abeff-bfeb-49c6-9638-7b02b6e71223, storageType = NFS, connectionList = [{ id: fd8ce5ae-562c-42d9-9487-052e35f5cd9c, connection: shual.eng.lab.tlv.redhat.com:/volumes/shual/integration/rhevm-31-integ-iso, iqn: null, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 3bb28a28 2013-06-19 10:53:42,947 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-29) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused 2013-06-19 10:53:43,240 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-23) [2693e676] FINISH, ConnectStorageServerVDSCommand, return: {fd8ce5ae-562c-42d9-9487-052e35f5cd9c=0}, log id: 3bb28a28 2013-06-19 10:53:43,240 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-23) [2693e676] Host white-vdsc.ci.lab.tlv.redhat.com storage connection was succeeded 2013-06-19 10:53:43,274 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-25) START, ConnectStoragePoolVDSCommand(HostName = white-vdsc.ci.lab.tlv.redhat.com, HostId = 39e7a216-75b8-11e2-af2b-00145e8327d8, storagePoolId = 430abeff-bfeb-49c6-9638-7b02b6e71223, vds_spm_id = 1, masterDomainId = 71baec3a-a456-49a9-99fd-4b297725b08d, masterVersion = 6603), log id: 3859d78b 2013-06-19 10:53:46,178 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (QuartzScheduler_Worker-32) Command GetCapabilitiesVDS execution failed. Exception: VDSNetworkException: java.net.ConnectException: Connection refused 2013-06-19 10:53:46,703 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-25) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value StatusOnlyReturnForXmlRpc [mStatus=StatusForXmlRpc [mCode=304, mMessage=Cannot find master domain: 'spUUID=430abeff-bfeb-49c6-9638-7b02b6e71223, msdUUID=71baec3a-a456-49a9-99fd-4b297725b08d']] 2013-06-19 10:53:46,704 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (pool-3-thread-25) HostName = white-vdsc.ci.lab.tlv.redhat.com 2013-06-19 10:53:46,766 ERROR [org.ovirt.engine.core.vdsbroker.VDSCommandBase] (pool-3-thread-25) Command ConnectStoragePoolVDS execution failed. Exception: IRSNoMasterDomainException: IRSGenericException: IRSErrorException: IRSNoMasterDomainException: Cannot find master domain: 'spUUID=430abeff-bfeb-49c6-9638-7b02b6e71223, msdUUID=71baec3a-a456-49a9-99fd-4b297725b08d' 2013-06-19 10:53:46,781 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (pool-3-thread-25) FINISH, ConnectStoragePoolVDSCommand, log id: 3859d78b 2013-06-19 10:53:46,819 ERROR [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (pool-3-thread-25) Could not connect host white-vdsc.ci.lab.tlv.redhat.com to pool Integration-Stable-FC 2013-06-19 10:53:46,894 INFO [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (QuartzScheduler_Worker-23) [2693e676] Running command: SetNonOperationalVdsCommand internal: true. Entities affected : ID: 39e7a216-75b8-11e2-af2b-00145e8327d8 Type: VDS
It appears that the domain status were been maniuplated manually (db manual update) after it was in LOCKED status in order to resolve an issue in other pool .therefore this issue isn't a bug as the situation has occurred by manual manipulation of the DB, probably in order to fix bug 975742 - as the query was general, the domain statuses in that pool were changed as well. the issue of the domain being locked is a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=969640 therefore it seems like this one can be closed.
Agreed, closing this one. *** This bug has been marked as a duplicate of bug 969640 ***