Created attachment 632143 [details] logs Description of problem: I have two hosts in the setup. after blocking storage from both hosts, spm becomes non-operational and hsm fails to acquire lease and remains in up state but not SPM. if I remove the iptables block from the non-operational host only, the auto recovery will fail to activate the storage and hosts. although the ConnectStorageServerVDSCommand succeeds on the Non-Operational host, ConnectStoragePoolVDSCommand will fail with Cannot Find Master Domain. and... we will get Wrong Master domain or its version since failed connect to pool by Auto Recovery will up the master version Version-Release number of selected component (if applicable): si21.1 How reproducible: 100% Steps to Reproduce: 1. in two hosts cluster, block connectivity to the storage from both hosts 2. after the spm becomes non-operational and the second host releases spm restore the connectivity to the storage from the non-operational host only 3. Actual results: 1.Auto Recovery will fail to recover storage/hosts (non-operational host will become unassigned -> back to non-operational). 2. since we up the master version we would have to put up host in maintenance so that recovery can happen. Expected results: since engine requires leaving 1 host in up state to allow recovery, and Auto recovery cannot recover other hosts while there is a host in up state in setup I would suggest that only one of these flows be active (so if auto recovery is activated engine recovery is disabled). Additional info: full logs 2012-10-23 15:15:01,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable hosts done 2012-10-23 15:15:01,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains 2012-10-23 15:15:01,021 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Autorecovering 0 storage domains 2012-10-23 15:15:01,021 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains done 2012-10-23 15:15:01,227 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-80) [64f7ab0c] Running command: InitVdsOnUpCommand internal: true. 2012-10-23 15:15:01,316 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ValidateStorageServerConnectionVDSCommand(HostName = gold-vdsd, HostId = 0 e8479de-1c56-11e2-b621-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbb b558, connection: 10.35.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 2798240a 2012-10-23 15:15:01,327 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ValidateStorageServerConnectionVDSCommand, return: {b5a56dcc-ef37-48eb-b8 3a-92db3b366aaa=0, 600b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 2798240a 2012-10-23 15:15:01,328 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID: 1167fe48-4788-486d-876b-f8261ede6c23 Type: StoragePool 2012-10-23 15:15:01,329 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStorageServerVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621 -001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null, mountOptions: null, n fsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id : 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbbb558, connection: 10.3 5.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 3e6ace41 2012-10-23 15:15:01,648 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7633b7eb-62d0-498d-a762-c1da4f3b505f:Dafna-Upgrade-03 in problem. vds: gold-vdsc 2012-10-23 15:15:01,649 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7bdb9b94-729f-409b-94d8-bad3fe0d4d6f:Dafna-Upgrade-04 in problem. vds: gold-vdsc 2012-10-23 15:15:01,652 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain f844782b-dc73-4c35-b776-92ef809ab6f5:Dafna-Upgrade-02 in problem. vds: gold-vdsc 2012-10-23 15:15:01,653 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 6faf7684-e22a-4332-8ad9-0ad89dbd6172:Dafna-Upgrade-01 in problem. vds: gold-vdsc 2012-10-23 15:15:02,019 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ConnectStorageServerVDSCommand, return: {b5a56dcc-ef37-48eb-b83a-92db3b366aaa=0, 600 b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 3e6ace41 2012-10-23 15:15:02,019 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Host gold-vdsd storage connection was succeeded 2012-10-23 15:15:02,121 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStoragePoolVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621-001 a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 2, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 6879a6e 2012-10-23 15:15:02,303 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-85) hostFromVds::selectedVds - gold-vdsc, spmStatus Unknown_Pool, storage pool iSCSI 2012-10-23 15:15:02,324 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-85) START, ConnectStoragePoolVDSCommand(HostName = gold-vdsc, HostId = 0419c81e-1c56-11e2-9707-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 1, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 20beb48d 2012-10-23 15:15:02,634 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-85) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 304 mMessage Cannot find master domain: 'spUUID=1167fe48-4788-486d-876b-f8261ede6c23, msdUUID=7633b7eb-62d0-498d-a762-c1da4f3b505f'
These bug is not related to auto-recovery a similar behaviour will be if someone will try to Activate a following host
We have opened 3 Bugs that together need to handle the above scenario.
http://gerrit.ovirt.org/#/c/10103/
(In reply to comment #4) > http://gerrit.ovirt.org/#/c/10103/ The above patch belongs to bug 882837 but should handle this scenario as well. Hence moving to MODIFIED and later to ON_QA as the above scenario should be verified as well.
Setting docs_scoped- as this looks like a series of bug fixes to provide the behaviour users already expect rather than a new feature (again from user POV).
verified on sf10 as part of 882837 and 874019
3.2 has been released