Bug 1311693

Summary: [hosted-engine] [iSCSI] autoimport may break current and sibling HE deployments using the same storage
Product: [oVirt] ovirt-engine Reporter: Simone Tiraboschi <stirabos>
Component: BLL.HostedEngineAssignee: Roy Golan <rgolan>
Status: CLOSED DUPLICATE QA Contact: Artyom <alukiano>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 3.6.1.3CC: alukiano, bugs, dfediuck, mavital, sbonazzo
Target Milestone: ovirt-3.6.4Flags: dfediuck: ovirt-3.6.z?
rule-engine: planning_ack?
rule-engine: devel_ack?
rule-engine: testing_ack?
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sla
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2016-02-25 10:26:32 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: SLA RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simone Tiraboschi 2016-02-24 18:27:26 UTC
Description of problem:
The hosted-engine storage domain auto-import procedure in the engine is going to try to import every possible storage domain which is labeled 'hosted_storage' without checking any further details.

This is an issue in particular on iSCSI env:
on iSCSI indeed in the connectStorageServer we connect the whole iSCSI target and not just the single LUN which contains the hosted-engine storage domain.

So all the hosts connected to that iSCSI portal will see all the LUNs and all the storage domains.

Indeed we have (from our QE env):
Thread-4222::INFO::2016-02-24 19:05:13,694::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='0980e658-74e0-4da1-85ab-1e9490984f21', conList=[{'id': '2ce7c009-aa22-4776-b740-9a115d781ef6', 'connection': '10.35.146.129', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', 'portal': '1', 'user': 'tech', 'password': '********', 'port': '3260'}], options=None)

But then:
[root@alma05 ~]# vdsClient -s 0 getStorageDomainsList
00c5a12b-ae3d-4fe4-9996-a35814ac5150
ac59b72c-9b89-415d-9510-e3027899d55c
4e12e0cf-42e3-4bce-9059-7c6850e85795
efc677fb-2fa6-4af9-88e1-2e79708f1519

[root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo 00c5a12b-ae3d-4fe4-9996-a35814ac5150
	uuid = 00c5a12b-ae3d-4fe4-9996-a35814ac5150
	vguuid = 5sdZP9-aFtu-Ccn2-lZVt-68Ii-ya5b-Fh8PDI
	state = OK
	version = 3
	role = Regular
	type = ISCSI
	class = Data
	pool = ['0980e658-74e0-4da1-85ab-1e9490984f21']
	name = hosted_storage

[root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo ac59b72c-9b89-415d-9510-e3027899d55c
	uuid = ac59b72c-9b89-415d-9510-e3027899d55c
	vguuid = qLfNby-zbRm-X1I6-iv1u-ZitY-6N5e-Oq7FHe
	state = OK
	version = 3
	role = Regular
	type = ISCSI
	class = Data
	pool = []
	name = hosted_storage

[root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo 4e12e0cf-42e3-4bce-9059-7c6850e85795
	uuid = 4e12e0cf-42e3-4bce-9059-7c6850e85795
	vguuid = COObTz-X3J6-F3Cc-OzaU-y2R8-69we-pqwFzh
	state = OK
	version = 3
	role = Master
	type = ISCSI
	class = Data
	pool = ['a9f79202-46e3-4a1e-9dea-034b36838cfc']
	name = hosted_storage

[root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo efc677fb-2fa6-4af9-88e1-2e79708f1519
	uuid = efc677fb-2fa6-4af9-88e1-2e79708f1519
	vguuid = UCqqNX-6crx-ytA8-pkqc-pVV0-VUJw-6ulFNO
	state = OK
	version = 3
	role = Regular
	type = ISCSI
	class = Data
	pool = []
	name = hosted_storage

While on the engine logs of another engine VM which is managing  another couple of hosts (and not alma05) we can see:

VDSM command failed: Cannot acquire host id: ('4e12e0cf-42e3-4bce-9059-7c6850e85795', SanlockException(-262, 'Sanlock lockspace add failure', 'Sanlock exception'))
VDSM command failed: Cannot acquire host id: ('00c5a12b-ae3d-4fe4-9996-a35814ac5150', SanlockException(-262, 'Sanlock lockspace add failure', 'Sanlock exception'))

So that engine is randomly trying to import every possible domain which is labeled 'hosted_storage' and available on that iSCSI portal regardless of its LUN or vguuid breaking not just itself but also other hosted-engine environments.

Indeed, in the case we saw, alma05 was failing its 3.5 -> 3.6 upgrade since it was fighting with one of the host managed by another engine for its storage domain.

Version-Release number of selected component (if applicable):
3.6.3 rc4

How reproducible:
it depends on the LUN numbers: if the engine first tries to import its own storage domain it's fine, otherwise it will become a mess on both of the two env.

Steps to Reproduce:
1. deploy hosted-engine on iSCSI
2. create another LUN on the same iSCSI portal
3. deploy another hosted-engine on the same LUN
4. trigger the auto import procedure on one of the two hosted-engine env adding the first regular storage domain

Actual results:
the engine will randomly try to import each available storage domain called 'hosted_storage'

Expected results:
auto-import procedure should match some other parameters (vguuid?) from the engine VM definition to ensure that it's trying to import the right SD.

Another option is to prevent it refusing to deploy hosted-engine if we found any other available storage domain called hosted_storage. This will not solve for who already deployed more than hosted-engine env in the past using the same iSCSI portal.

Additional info:
workaround: deploy just a single hosted-engine instance for iSCSI portal

Comment 1 Yaniv Kaul 2016-02-25 09:17:44 UTC
Best practices would dictate using different initiator groups for the different setups.
I'd CLOSE-WONTFIX this, with a recommendation to document the pre-requisites well.

Comment 2 Artyom 2016-02-25 09:25:25 UTC
In this case I believe we need to show at least some warning message when we deploy HE over ISCSI storage and host initiator group mapped to more that one LUN.


2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 The following luns have been found on the requested target:
2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                    1 - sdb - 71680MB
2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                    2 - sdc - 102400MB
2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                    3 - sdd - 76800MB
2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                    4 - sde - 75776MB
2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human human.queryString:153 query OVEHOSTED_STORAGE_ISCSI_LUN
2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND                 Please specify the lun id (1, 2, 3, 4) [1]:
2016-02-24 15:02:53 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:RECEIVE    3

Comment 3 Roy Golan 2016-02-25 10:26:32 UTC
I'm closing this as duplicate of Bug 1294457. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1294457#c20

Comment 4 Roy Golan 2016-02-25 11:34:27 UTC

*** This bug has been marked as a duplicate of bug 1294457 ***