Description of problem: The hosted-engine storage domain auto-import procedure in the engine is going to try to import every possible storage domain which is labeled 'hosted_storage' without checking any further details. This is an issue in particular on iSCSI env: on iSCSI indeed in the connectStorageServer we connect the whole iSCSI target and not just the single LUN which contains the hosted-engine storage domain. So all the hosts connected to that iSCSI portal will see all the LUNs and all the storage domains. Indeed we have (from our QE env): Thread-4222::INFO::2016-02-24 19:05:13,694::logUtils::48::dispatcher::(wrapper) Run and protect: connectStorageServer(domType=3, spUUID='0980e658-74e0-4da1-85ab-1e9490984f21', conList=[{'id': '2ce7c009-aa22-4776-b740-9a115d781ef6', 'connection': '10.35.146.129', 'iqn': 'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00', 'portal': '1', 'user': 'tech', 'password': '********', 'port': '3260'}], options=None) But then: [root@alma05 ~]# vdsClient -s 0 getStorageDomainsList 00c5a12b-ae3d-4fe4-9996-a35814ac5150 ac59b72c-9b89-415d-9510-e3027899d55c 4e12e0cf-42e3-4bce-9059-7c6850e85795 efc677fb-2fa6-4af9-88e1-2e79708f1519 [root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo 00c5a12b-ae3d-4fe4-9996-a35814ac5150 uuid = 00c5a12b-ae3d-4fe4-9996-a35814ac5150 vguuid = 5sdZP9-aFtu-Ccn2-lZVt-68Ii-ya5b-Fh8PDI state = OK version = 3 role = Regular type = ISCSI class = Data pool = ['0980e658-74e0-4da1-85ab-1e9490984f21'] name = hosted_storage [root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo ac59b72c-9b89-415d-9510-e3027899d55c uuid = ac59b72c-9b89-415d-9510-e3027899d55c vguuid = qLfNby-zbRm-X1I6-iv1u-ZitY-6N5e-Oq7FHe state = OK version = 3 role = Regular type = ISCSI class = Data pool = [] name = hosted_storage [root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo 4e12e0cf-42e3-4bce-9059-7c6850e85795 uuid = 4e12e0cf-42e3-4bce-9059-7c6850e85795 vguuid = COObTz-X3J6-F3Cc-OzaU-y2R8-69we-pqwFzh state = OK version = 3 role = Master type = ISCSI class = Data pool = ['a9f79202-46e3-4a1e-9dea-034b36838cfc'] name = hosted_storage [root@alma05 ~]# vdsClient -s 0 getStorageDomainInfo efc677fb-2fa6-4af9-88e1-2e79708f1519 uuid = efc677fb-2fa6-4af9-88e1-2e79708f1519 vguuid = UCqqNX-6crx-ytA8-pkqc-pVV0-VUJw-6ulFNO state = OK version = 3 role = Regular type = ISCSI class = Data pool = [] name = hosted_storage While on the engine logs of another engine VM which is managing another couple of hosts (and not alma05) we can see: VDSM command failed: Cannot acquire host id: ('4e12e0cf-42e3-4bce-9059-7c6850e85795', SanlockException(-262, 'Sanlock lockspace add failure', 'Sanlock exception')) VDSM command failed: Cannot acquire host id: ('00c5a12b-ae3d-4fe4-9996-a35814ac5150', SanlockException(-262, 'Sanlock lockspace add failure', 'Sanlock exception')) So that engine is randomly trying to import every possible domain which is labeled 'hosted_storage' and available on that iSCSI portal regardless of its LUN or vguuid breaking not just itself but also other hosted-engine environments. Indeed, in the case we saw, alma05 was failing its 3.5 -> 3.6 upgrade since it was fighting with one of the host managed by another engine for its storage domain. Version-Release number of selected component (if applicable): 3.6.3 rc4 How reproducible: it depends on the LUN numbers: if the engine first tries to import its own storage domain it's fine, otherwise it will become a mess on both of the two env. Steps to Reproduce: 1. deploy hosted-engine on iSCSI 2. create another LUN on the same iSCSI portal 3. deploy another hosted-engine on the same LUN 4. trigger the auto import procedure on one of the two hosted-engine env adding the first regular storage domain Actual results: the engine will randomly try to import each available storage domain called 'hosted_storage' Expected results: auto-import procedure should match some other parameters (vguuid?) from the engine VM definition to ensure that it's trying to import the right SD. Another option is to prevent it refusing to deploy hosted-engine if we found any other available storage domain called hosted_storage. This will not solve for who already deployed more than hosted-engine env in the past using the same iSCSI portal. Additional info: workaround: deploy just a single hosted-engine instance for iSCSI portal
Best practices would dictate using different initiator groups for the different setups. I'd CLOSE-WONTFIX this, with a recommendation to document the pre-requisites well.
In this case I believe we need to show at least some warning message when we deploy HE over ISCSI storage and host initiator group mapped to more that one LUN. 2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND The following luns have been found on the requested target: 2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND 1 - sdb - 71680MB 2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND 2 - sdc - 102400MB 2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND 3 - sdd - 76800MB 2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND 4 - sde - 75776MB 2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human human.queryString:153 query OVEHOSTED_STORAGE_ISCSI_LUN 2016-02-24 15:02:49 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:SEND Please specify the lun id (1, 2, 3, 4) [1]: 2016-02-24 15:02:53 DEBUG otopi.plugins.otopi.dialog.human dialog.__logString:215 DIALOG:RECEIVE 3
I'm closing this as duplicate of Bug 1294457. Please see https://bugzilla.redhat.com/show_bug.cgi?id=1294457#c20
*** This bug has been marked as a duplicate of bug 1294457 ***