Bug 1667708 - Restore SHE environment on iscsi failed to reach storage domain
Summary: Restore SHE environment on iscsi failed to reach storage domain
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 4.2.8
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ovirt-4.3.2
: 4.3.0
Assignee: Simone Tiraboschi
QA Contact: meital avital
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-01-20 08:37 UTC by Pedut
Modified: 2019-02-25 16:33 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-25 16:33:44 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
sosreport (10.93 MB, application/x-xz)
2019-01-20 08:38 UTC, Pedut
no flags Details
hosted engine deployment (477.07 KB, text/plain)
2019-01-20 08:45 UTC, Pedut
no flags Details

Description Pedut 2019-01-20 08:37:44 UTC
Description of problem:
Creating a backup on SHE environment on iscsi failed to reach storage domain with HTTP response code 400.

Version-Release number of selected component (if applicable):
ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.32-1.el7ev.noarch
rhvm-appliance-4.2-20181212.0.el7.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
python-ovirt-engine-sdk4-4.2.9-1.el7ev.x86_64
ansible-2.7.5-1.el7ae.noarch
otopi-1.7.8-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Redploy on iscsi from node0 environment, where SPM is not what we have in the backup && power management not configured
2.
3.

Actual results:
Restore fails.

Expected results:
Restore should succeed.

Additional info:
Normal deployment on iscsi works.

Comment 1 Pedut 2019-01-20 08:38:31 UTC
Created attachment 1521920 [details]
sosreport

Comment 2 Pedut 2019-01-20 08:45:11 UTC
Created attachment 1521921 [details]
hosted engine deployment

Comment 3 Nikolai Sednev 2019-01-21 16:10:14 UTC
I've also tried to backup from Node0 over FC and to restore on iSCSI and failed with the same error:

[ ERROR ] -Please activate the master Storage Domain first.]". HTTP response code is 409.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "deprecations": [{"msg": "The 'ovirt_storage_domains' module is being renamed 'ovirt_storage_domain'", "version": 2.8}], "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Failed to attach Storage due to an error on the Data Center master Storage Domain.\n-Please activate the master Storage Domain first.]\". HTTP response code is 409."}

rhvm-appliance-4.2-20190108.0.el7.noarch
ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.33-1.el7ev.noarch
Linux 3.10.0-957.1.3.el7.x86_64 #1 SMP Thu Nov 15 17:36:42 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.6 (Maipo)

Comment 5 Simone Tiraboschi 2019-01-23 10:08:00 UTC
The issue comes from here:
the new hosted-engine SD ({name='hosted_storage', id='09090b0a-e820-4052-a10b-f3e8f3d2eba2'}) got sucesfully created on master-vds10.qa.lab.tlv.redhat.com

 2019-01-16 10:08:57,465+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-1) [2fde8e09] START, CreateStorageDomainVDSCommand(HostName = master-vds10.qa.lab.tlv.redhat.com, CreateStorageDomainVDSCommandParameters:{hostId='c8b6c013-8c28-48c6-9b16-49e53dee937c', storageDomain='StorageDomainStatic:{name='hosted_storage', id='09090b0a-e820-4052-a10b-f3e8f3d2eba2'}', args='7XbLmx-A7IJ-fWth-Ilel-lTxR-5RAb-ejXzb3'}), log id: 179b8732
 2019-01-16 10:09:02,355+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.CreateStorageDomainVDSCommand] (default task-1) [2fde8e09] FINISH, CreateStorageDomainVDSCommand, log id: 179b8732
 2019-01-16 10:09:02,359+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStorageDomainStatsVDSCommand] (default task-1) [2fde8e09] START, GetStorageDomainStatsVDSCommand(HostName = master-vds10.qa.lab.tlv.redhat.com, GetStorageDomainStatsVDSCommandParameters:{hostId='c8b6c013-8c28-48c6-9b16-49e53dee937c', storageDomainId='09090b0a-e820-4052-a10b-f3e8f3d2eba2'}), log id: 204d8cad
 2019-01-16 10:09:02,527+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetStorageDomainStatsVDSCommand] (default task-1) [2fde8e09] FINISH, GetStorageDomainStatsVDSCommand, return: StorageDomain:{domainName='', domainId='09090b0a-e820-4052-a10b-f3e8f3d2eba2'}, log id: 204d8cad
 2019-01-16 10:09:02,545+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (default task-1) [2fde8e09] START, GetVGInfoVDSCommand(HostName = master-vds10.qa.lab.tlv.redhat.com, GetVGInfoVDSCommandParameters:{hostId='c8b6c013-8c28-48c6-9b16-49e53dee937c', VGID='7XbLmx-A7IJ-fWth-Ilel-lTxR-5RAb-ejXzb3'}), log id: e84cc1f
 2019-01-16 10:09:02,871+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.GetVGInfoVDSCommand] (default task-1) [2fde8e09] FINISH, GetVGInfoVDSCommand, return: [LUNs:{id='360002ac000000000000000ec00021f6b', physicalVolumeId='8tTsCT-9dLn-SnDf-OOO2-RiWw-MCaN-aJAGAn', volumeGroupId='7XbLmx-A7IJ-fWth-Ilel-lTxR-5RAb-ejXzb3', serial='S3PARdataVV_CZ3836C3RB', lunMapping='0', vendorId='3PARdata', productId='VV', lunConnections='[StorageServerConnections:{id='null', connection='10.35.146.1', iqn='iqn.2000-05.com.3pardata:20210002ac021f6b', vfsType='null', mountOptions='null', nfsVersion='null', nfsRetrans='null', nfsTimeo='null', iface='null', netIfaceName='null'}]', deviceSize='130', pvSize='0', peCount='1037', peAllocatedCount='39', vendorName='3PARdata', pathsDictionary='[sdb=true]', pathsCapacity='[sdb=130]', lunType='ISCSI', status='null', diskId='null', diskAlias='null', storageDomainId='09090b0a-e820-4052-a10b-f3e8f3d2eba2', storageDomainName='null', discardMaxSize='16777216'}], log id: e84cc1f

After that, the new storage domain is going to be connected to all the hosts of the same datacenter, but we encoutered an issue connecting it on a different host named host_mixed_2 :
 2019-01-16 10:09:25,324+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] Failed in 'HSMGetStorageDomainInfoVDS' method
 2019-01-16 10:09:25,334+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] EVENT_ID: VDS_BROKER_COMMAND_FAILURE(10,802), VDSM host_mixed_2 command HSMGetStorageDomainInfoVDS failed: Storage domain does not exist: (u'09090b0a-e820-4052-a10b-f3e8f3d2eba2',)
 2019-01-16 10:09:25,335+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] Command 'org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand' return value '
OneStorageDomainInfoReturn:{status='Status [code=358, message=Storage domain does not exist: (u'09090b0a-e820-4052-a10b-f3e8f3d2eba2',)]'}
'
 2019-01-16 10:09:25,335+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] HostName = host_mixed_2
 2019-01-16 10:09:25,335+02 ERROR [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] Command 'HSMGetStorageDomainInfoVDSCommand(HostName = host_mixed_2, HSMGetStorageDomainInfoVDSCommandParameters:{hostId='72927359-bfad-4125-83fd-3f68bda17195', storageDomainId='09090b0a-e820-4052-a10b-f3e8f3d2eba2'})' execution failed: VDSGenericException: VDSErrorException: Failed to HSMGetStorageDomainInfoVDS, error = Storage domain does not exist: (u'09090b0a-e820-4052-a10b-f3e8f3d2eba2',), code = 358
 2019-01-16 10:09:25,335+02 INFO  [org.ovirt.engine.core.vdsbroker.vdsbroker.HSMGetStorageDomainInfoVDSCommand] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] FINISH, HSMGetStorageDomainInfoVDSCommand, log id: 421b9124
 2019-01-16 10:09:25,335+02 ERROR [org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] Command 'org.ovirt.engine.core.bll.storage.domain.AttachStorageDomainToPoolCommand' failed: EngineException: org.ovirt.engine.core.vdsbroker.vdsbroker.VDSErrorException: VDSGenericException: VDSErrorException: Failed to HSMGetStorageDomainInfoVDS, error = Storage domain does not exist: (u'09090b0a-e820-4052-a10b-f3e8f3d2eba2',), code = 358 (Failed with error StorageDomainDoesNotExist and code 358)
 2019-01-16 10:09:25,352+02 INFO  [org.ovirt.engine.core.bll.CommandCompensator] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] Command [id=f994564b-4ae1-4798-90df-449983478e7d]: Compensating NEW_ENTITY_ID of org.ovirt.engine.core.common.businessentities.StoragePoolIsoMap; snapshot: StoragePoolIsoMapId:{storagePoolId='05c1e898-18b4-11e9-82fc-001a4a16109f', storageId='09090b0a-e820-4052-a10b-f3e8f3d2eba2'}.
 2019-01-16 10:09:25,375+02 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (default task-1) [1f8344e5-329b-420c-8727-aafc32777dc7] EVENT_ID: USER_ATTACH_STORAGE_DOMAIN_TO_POOL_FAILED(963), Failed to attach Storage Domain hosted_storage to Data Center golden_env_mixed. (User: admin@internal-authz)

And so the new storage domain failed to be attached to the datacenter due to the error on host_mixed_2.

To understand what really happened we need vdsm logs from host_mixed_2 but they aren't here.

Pedut, are you sure that the LUN used for the new storage domain was correctly configured to be accessed from all the hosts in the datacenter as registered in the backup?

Comment 6 Sandro Bonazzola 2019-02-06 08:48:24 UTC
Pedut?

Comment 7 Sandro Bonazzola 2019-02-18 07:54:47 UTC
Moving to 4.3.2 not being identified as blocker for 4.3.1.

Comment 8 Sandro Bonazzola 2019-02-20 08:42:28 UTC
Nikolai reproduced this, let's get an environment where this reproduce so we can investigate.

Comment 9 Nikolai Sednev 2019-02-25 16:33:44 UTC
Works fine on these components:
rhvm-appliance-4.3-20190220.2.el7.x86_64
ovirt-hosted-engine-setup-2.3.5-1.el7ev.noarch
ovirt-hosted-engine-ha-2.3.1-1.el7ev.noarch
ovirt-ansible-engine-setup-1.1.8-1.el7ev.noarch
Linux 3.10.0-957.5.1.el7.x86_64 #1 SMP Wed Dec 19 10:46:58 EST 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.6 (Maipo)

Closing as worksforme.

I followed the initial reproduction steps from the description and using described above components, restore was successful.


Note You need to log in before you can comment on or make changes to this bug.