Bug 869309 - PRD32 - engine: auto recovery cannot recover hosts/storage when last host in setup still has no access to storage and not in maintenance
Summary: PRD32 - engine: auto recovery cannot recover hosts/storage when last host in ...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 3.1.0
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: 3.2.0
Assignee: Barak
QA Contact: Leonid Natapov
URL:
Whiteboard: infra
Depends On: 876235 882832 882837
Blocks: 915537
TreeView+ depends on / blocked
 
Reported: 2012-10-23 14:32 UTC by Dafna Ron
Modified: 2016-02-10 19:17 UTC (History)
13 users (show)

Fixed In Version: sf3
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
logs (1.23 MB, application/x-gzip)
2012-10-23 14:32 UTC, Dafna Ron
no flags Details

Description Dafna Ron 2012-10-23 14:32:02 UTC
Created attachment 632143 [details]
logs

Description of problem:

I have two hosts in the setup. 
after blocking storage from both hosts, spm becomes non-operational and hsm fails to acquire lease and remains in up state but not SPM. 
if I remove the iptables block from the non-operational host only, the auto recovery will fail to activate the storage and hosts. 

although the ConnectStorageServerVDSCommand succeeds on the Non-Operational host, ConnectStoragePoolVDSCommand will fail with Cannot Find Master Domain. 
and... we will get Wrong Master domain or its version since failed connect to pool by Auto Recovery will up the master version
 
Version-Release number of selected component (if applicable):

si21.1 

How reproducible:

100%

Steps to Reproduce:
1. in two hosts cluster, block connectivity to the storage from both hosts
2. after the spm becomes non-operational and the second host releases spm restore the connectivity to the storage from the non-operational host only
3.
  
Actual results:

1.Auto Recovery will fail to recover storage/hosts (non-operational host will become unassigned -> back to non-operational). 
2. since we up the master version we would have to put up host in maintenance so that recovery can happen. 


Expected results:

since engine requires leaving 1 host in up state to allow recovery, and Auto recovery cannot recover other hosts while there is a host in up state in setup I would suggest that only one of these flows be active (so if auto recovery is activated engine recovery is disabled). 

Additional info: full logs


2012-10-23 15:15:01,019 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable hosts done
2012-10-23 15:15:01,019 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains
2012-10-23 15:15:01,021 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Autorecovering 0 storage domains
2012-10-23 15:15:01,021 INFO  [org.ovirt.engine.core.bll.AutoRecoveryManager] (QuartzScheduler_Worker-77) [66613508] Checking autorecoverable storage domains done
2012-10-23 15:15:01,227 INFO  [org.ovirt.engine.core.bll.InitVdsOnUpCommand] (QuartzScheduler_Worker-80) [64f7ab0c] Running command: InitVdsOnUpCommand internal: true.
2012-10-23 15:15:01,316 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ValidateStorageServerConnectionVDSCommand(HostName = gold-vdsd, HostId = 0
e8479de-1c56-11e2-b621-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null,
 mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null,
 nfsTimeo: null };{ id: 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbb
b558, connection: 10.35.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 2798240a
2012-10-23 15:15:01,327 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ValidateStorageServerConnectionVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ValidateStorageServerConnectionVDSCommand, return: {b5a56dcc-ef37-48eb-b8
3a-92db3b366aaa=0, 600b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 2798240a
2012-10-23 15:15:01,328 INFO  [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Running command: ConnectHostToStoragePoolServersCommand internal: true. Entities affected : 
 ID: 1167fe48-4788-486d-876b-f8261ede6c23 Type: StoragePool
2012-10-23 15:15:01,329 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStorageServerVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621
-001a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, storageType = ISCSI, connectionList = [{ id: b5a56dcc-ef37-48eb-b83a-92db3b366aaa, connection: 10.35.64.10, iqn: Dafna-Upgrade-03, vfsType: null, mountOptions: null, n
fsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 600b6044-c53b-4309-8f85-fbd1558dbcc0, connection: 10.35.64.10, iqn: Dafna-Upgrade-04, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id
: 17aa00f8-63cb-4926-8763-bac1a4e251bf, connection: 10.35.64.10, iqn: Dafna-upgrade-02, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };{ id: 2030dacd-2069-4488-a7ca-abd07dbbb558, connection: 10.3
5.64.10, iqn: Dafna-upgrade-01, vfsType: null, mountOptions: null, nfsVersion: null, nfsRetrans: null, nfsTimeo: null };]), log id: 3e6ace41
2012-10-23 15:15:01,648 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7633b7eb-62d0-498d-a762-c1da4f3b505f:Dafna-Upgrade-03 in problem. vds: gold-vdsc
2012-10-23 15:15:01,649 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 7bdb9b94-729f-409b-94d8-bad3fe0d4d6f:Dafna-Upgrade-04 in problem. vds: gold-vdsc
2012-10-23 15:15:01,652 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain f844782b-dc73-4c35-b776-92ef809ab6f5:Dafna-Upgrade-02 in problem. vds: gold-vdsc
2012-10-23 15:15:01,653 WARN  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-76) [dc04fe] domain 6faf7684-e22a-4332-8ad9-0ad89dbd6172:Dafna-Upgrade-01 in problem. vds: gold-vdsc
2012-10-23 15:15:02,019 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStorageServerVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] FINISH, ConnectStorageServerVDSCommand, return: {b5a56dcc-ef37-48eb-b83a-92db3b366aaa=0, 600
b6044-c53b-4309-8f85-fbd1558dbcc0=0, 17aa00f8-63cb-4926-8763-bac1a4e251bf=0, 2030dacd-2069-4488-a7ca-abd07dbbb558=0}, log id: 3e6ace41
2012-10-23 15:15:02,019 INFO  [org.ovirt.engine.core.bll.storage.ConnectHostToStoragePoolServersCommand] (QuartzScheduler_Worker-80) [79f9af98] Host gold-vdsd storage connection was succeeded 
2012-10-23 15:15:02,121 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-80) [79f9af98] START, ConnectStoragePoolVDSCommand(HostName = gold-vdsd, HostId = 0e8479de-1c56-11e2-b621-001
a4a169741, storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 2, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 6879a6e
2012-10-23 15:15:02,303 INFO  [org.ovirt.engine.core.vdsbroker.irsbroker.IrsBrokerCommand] (QuartzScheduler_Worker-85) hostFromVds::selectedVds - gold-vdsc, spmStatus Unknown_Pool, storage pool iSCSI
2012-10-23 15:15:02,324 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand] (QuartzScheduler_Worker-85) START, ConnectStoragePoolVDSCommand(HostName = gold-vdsc, HostId = 0419c81e-1c56-11e2-9707-001a4a169741, 
storagePoolId = 1167fe48-4788-486d-876b-f8261ede6c23, vds_spm_id = 1, masterDomainId = 7633b7eb-62d0-498d-a762-c1da4f3b505f, masterVersion = 45), log id: 20beb48d
2012-10-23 15:15:02,634 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.BrokerCommandBase] (QuartzScheduler_Worker-85) Command org.ovirt.engine.core.vdsbroker.vdsbroker.ConnectStoragePoolVDSCommand return value 
 Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusOnlyReturnForXmlRpc
mStatus                       Class Name: org.ovirt.engine.core.vdsbroker.vdsbroker.StatusForXmlRpc
mCode                         304
mMessage                      Cannot find master domain: 'spUUID=1167fe48-4788-486d-876b-f8261ede6c23, msdUUID=7633b7eb-62d0-498d-a762-c1da4f3b505f'

Comment 1 mkublin 2012-10-25 08:56:31 UTC
These bug is not related to auto-recovery a similar behaviour will be if someone will try to Activate a following host

Comment 3 Barak 2012-12-06 14:00:12 UTC
We have opened 3 Bugs that together need to handle the above scenario.

Comment 4 mkublin 2012-12-16 10:44:39 UTC
http://gerrit.ovirt.org/#/c/10103/

Comment 5 Barak 2013-01-02 08:45:21 UTC
(In reply to comment #4)
> http://gerrit.ovirt.org/#/c/10103/

The above patch belongs to bug 882837 but should handle this scenario as well.

Hence moving to MODIFIED and later to ON_QA as the above scenario should be verified as well.

Comment 6 Stephen Gordon 2013-01-04 21:16:25 UTC
Setting docs_scoped- as this looks like a series of bug fixes to provide the behaviour users already expect rather than a new feature (again from user POV).

Comment 7 Leonid Natapov 2013-03-17 16:23:46 UTC
verified on sf10 as part of 882837 and 874019

Comment 8 Itamar Heim 2013-06-11 09:52:05 UTC
3.2 has been released

Comment 9 Itamar Heim 2013-06-11 09:52:08 UTC
3.2 has been released

Comment 10 Itamar Heim 2013-06-11 09:58:59 UTC
3.2 has been released


Note You need to log in before you can comment on or make changes to this bug.