Created attachment 632143[details]
logs
Description of problem:
I have two hosts in the setup.
after blocking storage from both hosts, spm becomes non-operational and hsm fails to acquire lease and remains in up state but not SPM.
if I remove the iptables block from the non-operational host only, the auto recovery will fail to activate the storage and hosts.
although the ConnectStorageServerVDSCommand succeeds on the Non-Operational host, ConnectStoragePoolVDSCommand will fail with Cannot Find Master Domain.
and... we will get Wrong Master domain or its version since failed connect to pool by Auto Recovery will up the master version
Version-Release number of selected component (if applicable):
si21.1
How reproducible:
100%
Steps to Reproduce:
1. in two hosts cluster, block connectivity to the storage from both hosts
2. after the spm becomes non-operational and the second host releases spm restore the connectivity to the storage from the non-operational host only
3.
Actual results:
1.Auto Recovery will fail to recover storage/hosts (non-operational host will become unassigned -> back to non-operational).
2. since we up the master version we would have to put up host in maintenance so that recovery can happen.
Expected results:
since engine requires leaving 1 host in up state to allow recovery, and Auto recovery cannot recover other hosts while there is a host in up state in setup I would suggest that only one of these flows be active (so if auto recovery is activated engine recovery is disabled).
Additional info: full logs
2012-10-23 15:15:01,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable hosts done
2012-10-23 15:15:01,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains
2012-10-23 15:15:01,021 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Autorecovering 0 storage domains
2012-10-23 15:15:01,021 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains done
2012-10-23 15:15:01,227 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-80) [64f7ab0c] Running command: InitVdsOnUpCommand internal: true.
2012-10-23 15:15:01,316 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ValidateStorageServerConnectionVDSCommand(HostName = gold-vdsd, HostId = 0
e8479de-1c56-11e2-b621-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null,
mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null,
nfsTimeo: null };{ id: 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbb
b558, connection: 10.35.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 2798240a
2012-10-23 15:15:01,327 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ValidateStorageServerConnectionVDSCommand, return: {b5a56dcc-ef37-48eb-b8
3a-92db3b366aaa=0, 600b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 2798240a
2012-10-23 15:15:01,328 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected :
ID: 1167fe48-4788-486d-876b-f8261ede6c23 Type: StoragePool
2012-10-23 15:15:01,329 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStorageServerVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621
-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null, mountOptions: null, n
fsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id
: 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbbb558, connection: 10.3
5.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 3e6ace41
2012-10-23 15:15:01,648 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7633b7eb-62d0-498d-a762-c1da4f3b505f:Dafna-Upgrade-03 in problem. vds: gold-vdsc
2012-10-23 15:15:01,649 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7bdb9b94-729f-409b-94d8-bad3fe0d4d6f:Dafna-Upgrade-04 in problem. vds: gold-vdsc
2012-10-23 15:15:01,652 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain f844782b-dc73-4c35-b776-92ef809ab6f5:Dafna-Upgrade-02 in problem. vds: gold-vdsc
2012-10-23 15:15:01,653 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 6faf7684-e22a-4332-8ad9-0ad89dbd6172:Dafna-Upgrade-01 in problem. vds: gold-vdsc
2012-10-23 15:15:02,019 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ConnectStorageServerVDSCommand, return: {b5a56dcc-ef37-48eb-b83a-92db3b366aaa=0, 600
b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 3e6ace41
2012-10-23 15:15:02,019 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Host gold-vdsd storage connection was succeeded
2012-10-23 15:15:02,121 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStoragePoolVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621-001
a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 2, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 6879a6e
2012-10-23 15:15:02,303 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-85) hostFromVds::selectedVds - gold-vdsc, spmStatus Unknown_Pool, storage pool iSCSI
2012-10-23 15:15:02,324 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-85) START, ConnectStoragePoolVDSCommand(HostName = gold-vdsc, HostId = 0419c81e-1c56-11e2-9707-001a4a169741,
storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 1, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 20beb48d
2012-10-23 15:15:02,634 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-85) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value
Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode 304
mMessage Cannot find master domain: 'spUUID=1167fe48-4788-486d-876b-f8261ede6c23, msdUUID=7633b7eb-62d0-498d-a762-c1da4f3b505f'
(In reply to comment #4)
> http://gerrit.ovirt.org/#/c/10103/
The above patch belongs to bug 882837 but should handle this scenario as well.
Hence moving to MODIFIED and later to ON_QA as the above scenario should be verified as well.
Setting docs_scoped- as this looks like a series of bug fixes to provide the behaviour users already expect rather than a new feature (again from user POV).
Created attachment 632143 [details] logs Description of problem: I have two hosts in the setup. after blocking storage from both hosts, spm becomes non-operational and hsm fails to acquire lease and remains in up state but not SPM. if I remove the iptables block from the non-operational host only, the auto recovery will fail to activate the storage and hosts. although the ConnectStorageServerVDSCommand succeeds on the Non-Operational host, ConnectStoragePoolVDSCommand will fail with Cannot Find Master Domain. and... we will get Wrong Master domain or its version since failed connect to pool by Auto Recovery will up the master version Version-Release number of selected component (if applicable): si21.1 How reproducible: 100% Steps to Reproduce: 1. in two hosts cluster, block connectivity to the storage from both hosts 2. after the spm becomes non-operational and the second host releases spm restore the connectivity to the storage from the non-operational host only 3. Actual results: 1.Auto Recovery will fail to recover storage/hosts (non-operational host will become unassigned -> back to non-operational). 2. since we up the master version we would have to put up host in maintenance so that recovery can happen. Expected results: since engine requires leaving 1 host in up state to allow recovery, and Auto recovery cannot recover other hosts while there is a host in up state in setup I would suggest that only one of these flows be active (so if auto recovery is activated engine recovery is disabled). Additional info: full logs 2012-10-23 15:15:01,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable hosts done 2012-10-23 15:15:01,019 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains 2012-10-23 15:15:01,021 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Autorecovering 0 storage domains 2012-10-23 15:15:01,021 INFO [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains done 2012-10-23 15:15:01,227 INFO [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-80) [64f7ab0c] Running command: InitVdsOnUpCommand internal: true. 2012-10-23 15:15:01,316 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ValidateStorageServerConnectionVDSCommand(HostName = gold-vdsd, HostId = 0 e8479de-1c56-11e2-b621-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbb b558, connection: 10.35.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 2798240a 2012-10-23 15:15:01,327 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ValidateStorageServerConnectionVDSCommand, return: {b5a56dcc-ef37-48eb-b8 3a-92db3b366aaa=0, 600b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 2798240a 2012-10-23 15:15:01,328 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : ID: 1167fe48-4788-486d-876b-f8261ede6c23 Type: StoragePool 2012-10-23 15:15:01,329 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStorageServerVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621 -001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null, mountOptions: null, n fsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id : 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbbb558, connection: 10.3 5.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 3e6ace41 2012-10-23 15:15:01,648 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7633b7eb-62d0-498d-a762-c1da4f3b505f:Dafna-Upgrade-03 in problem. vds: gold-vdsc 2012-10-23 15:15:01,649 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7bdb9b94-729f-409b-94d8-bad3fe0d4d6f:Dafna-Upgrade-04 in problem. vds: gold-vdsc 2012-10-23 15:15:01,652 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain f844782b-dc73-4c35-b776-92ef809ab6f5:Dafna-Upgrade-02 in problem. vds: gold-vdsc 2012-10-23 15:15:01,653 WARN [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 6faf7684-e22a-4332-8ad9-0ad89dbd6172:Dafna-Upgrade-01 in problem. vds: gold-vdsc 2012-10-23 15:15:02,019 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ConnectStorageServerVDSCommand, return: {b5a56dcc-ef37-48eb-b83a-92db3b366aaa=0, 600 b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 3e6ace41 2012-10-23 15:15:02,019 INFO [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Host gold-vdsd storage connection was succeeded 2012-10-23 15:15:02,121 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStoragePoolVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621-001 a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 2, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 6879a6e 2012-10-23 15:15:02,303 INFO [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-85) hostFromVds::selectedVds - gold-vdsc, spmStatus Unknown_Pool, storage pool iSCSI 2012-10-23 15:15:02,324 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-85) START, ConnectStoragePoolVDSCommand(HostName = gold-vdsc, HostId = 0419c81e-1c56-11e2-9707-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 1, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 20beb48d 2012-10-23 15:15:02,634 INFO [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-85) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc mStatus Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc mCode 304 mMessage Cannot find master domain: 'spUUID=1167fe48-4788-486d-876b-f8261ede6c23, msdUUID=7633b7eb-62d0-498d-a762-c1da4f3b505f'