1522737 – Warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs

Bug 1522737 - Warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs

Summary: Warn admin during hosted engine deploy not to use all the space from hosted S...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-hosted-engine-setup
Sub Component:
Version:	4.1.6
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	ovirt-4.2.2
Target Release:	---
Assignee:	Simone Tiraboschi
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:
Depends On:	1549642 1551289 1551291 1559328 1561888
Blocks:	1458709 1538360
TreeView+	depends on / blocked

Reported:	2017-12-06 10:45 UTC by Marian Jankular
Modified:	2021-06-10 13:51 UTC (History)
CC List:	8 users (show)
Fixed In Version:	ovirt-hosted-engine-setup-2.2.16-1.el7ev
Doc Type:	Bug Fix
Doc Text:	ovirt-hosted-engine-setup does not allow installation to proceed if there is not enough free space for required OVF_STORE disks to be created.
Clone Of:
Environment:
Last Closed:	2018-05-15 17:32:28 UTC
oVirt Team:	Integration
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
engine log (137.05 KB, text/plain) 2018-03-21 15:12 UTC, Nikolai Sednev	no flags	Details
sosreport from alma03 (9.75 MB, application/x-xz) 2018-03-21 15:13 UTC, Nikolai Sednev	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2018:1471	None	None	None	2018-05-15 17:33:04 UTC
oVirt gerrit	85260	master	MERGED	storage: take care of OVF_STORE disks size	2021-02-10 11:48:47 UTC
oVirt gerrit	88353	ovirt-hosted-engine-setup-2.2	MERGED	storage: take care of OVF_STORE disks size	2021-02-10 11:48:47 UTC
oVirt gerrit	89366	master	MERGED	Change the order of disk creation	2021-02-10 11:48:47 UTC
oVirt gerrit	89614	master	MERGED	Reserve space for CRITICAL_SPACE_ACTION_BLOCKER	2021-02-10 11:48:46 UTC
oVirt gerrit	89616	ovirt-hosted-engine-setup-2.2	MERGED	Reserve space for CRITICAL_SPACE_ACTION_BLOCKER	2021-02-10 11:48:47 UTC
oVirt gerrit	89854	ovirt-hosted-engine-setup-2.2	MERGED	Change the order of disk creation	2021-02-10 11:48:47 UTC

Description Marian Jankular 2017-12-06 10:45:49 UTC

Description of problem:
RHV we should at least warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. deploy hosted engine
2. use all the space from storage domain


Actual results:
it will allow user to use all the space, so later we are not able to create ovfs files for hosted engine storage domain so we are not able to manipulate with hosted engine vm (adding additional nic for example)

Expected results:
admin should be at least warned that if (s)he uses all the space it will have consequences in the future

Additional info:

Comment 1 Yaniv Kaul 2017-12-06 10:50:16 UTC

How exactly was all the storage consumed? By what? The HE VM disk?

Comment 3 Yaniv Kaul 2017-12-07 08:31:54 UTC

severity?

Comment 5 Sandro Bonazzola 2018-02-14 10:04:37 UTC

Simone any progress on this?

Comment 6 Nikolai Sednev 2018-03-21 14:57:38 UTC

Deployment on LUN with less space then required for SHE disk and its additional satelites fails as follows:

[ INFO  ] ok: [localhost]
          The following luns have been found on the requested target:
                [1]     3514f0c5a5160167b       54GiB   XtremIO XtremApp
                        status: free, paths: 1 active
         
          Please select the destination LUN (1) [1]: 
[ INFO  ] iSCSI discard after delete is enabled
[ INFO  ] Creating Storage Domain
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch host facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch cluster id]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch cluster facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter id]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter_name]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Add nfs storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Add glusterfs storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Add iSCSI storage domain]
[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

I do not find this as proper warning for the customer though.
I thought that customer should be at least warned during deployment, that there is insufficient space on LUN, please extend or abort. 

Simone, can you please provide your input?

Comment 7 Simone Tiraboschi 2018-03-21 15:00:34 UTC

(In reply to Nikolai Sednev from comment #6)
> I do not find this as proper warning for the customer though.
> I thought that customer should be at least warned during deployment, that
> there is insufficient space on LUN, please extend or abort. 
> 
> Simone, can you please provide your input?

It failed before space check due to a different error, please check engine.log

Comment 8 Nikolai Sednev 2018-03-21 15:11:45 UTC

Attaching engine log and sosreport from host.

Comment 9 Nikolai Sednev 2018-03-21 15:12:12 UTC

Created attachment 1411249 [details]
engine log

Comment 10 Nikolai Sednev 2018-03-21 15:13:54 UTC

Created attachment 1411252 [details]
sosreport from alma03

Comment 11 Simone Tiraboschi 2018-03-21 15:18:18 UTC

This on engine.log:

2018-03-21 16:51:14,182+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem
2018-03-21 16:51:14,202+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand'.
2018-03-21 16:51:14,220+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-54) [] Operation Failed: []

Comment 12 Nikolai Sednev 2018-03-21 15:20:51 UTC

On host I do see normal connectivity to iSCSI storage:
alma03 ~]# multipath -ll
3514f0c5a5160167b dm-0 XtremIO ,XtremApp        
size=54G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  `- 6:0:0:1 sdb 8:16 active ready running

Comment 13 Simone Tiraboschi 2018-03-21 15:24:57 UTC

And this on vdsm side:

2018-03-21 16:51:12,988+0200 INFO  (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'10.35.146.129', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:46)
2018-03-21 16:51:13,822+0200 ERROR (jsonrpc/4) [storage.HSM] Could not connect to storageServer (hsm:2398)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2395, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 487, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode
    iscsiadm.node_login(iface.name, target.address, target.iqn)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 337, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (19, ['Logging in to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260].', 'iscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure)', 'iscsiadm: Could not log into all portals'])
2018-03-21 16:51:14,144+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 465, 'id': u'00000000-0000-0000-0000-000000000000'}]} from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:52)
2018-03-21 16:51:14,145+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 1.16 seconds (__init__:573)

Comment 14 Nikolai Sednev 2018-03-26 14:33:18 UTC

[ INFO  ] ok: [localhost]
          The following luns have been found on the requested target:
                [1]     3514f0c5a516016d7       60GiB   XtremIO XtremApp
                        status: free, paths: 1 active

[ INFO  ] TASK [Check storage domain free space]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 55296.0Mb of available space while a minimum of 56320.0Mb is required"}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

Works for me on these components:
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-setup-2.2.14-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 15 Nikolai Sednev 2018-03-29 14:30:31 UTC

Reopening due to https://bugzilla.redhat.com/show_bug.cgi?id=1562019#c14.
In CLI and Cockpit, minimum LUN sizes are different, hence for CLI 70GB LUN is sufficient, while for Cockpit its not.
Cockpit is using 58GB as minimum default requirement.
CLI is using 50GB as minimum default requirement.

Need to check properly in Cockpit for minimum required LUN's space, prior to deployment and report correctly for insufficient space instead of providing unreadable error report like this:
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_storage_domains": [{"available": 3221225472, "backup": false, "committed": 65498251264, "critical_space_action_blocker": 5, "data_centers": [{"href": "/ovirt-engine/api/datacenters/4c78136c-3344-11e8-a34a-00163e7bb853", "id": "4c78136c-3344-11e8-a34a-00163e7bb853"}], "discard_after_delete": false, "disk_profiles": [{"id": ["72088a9a-edb2-4111-92cb-b8f4cc0f9ab4"], "name": "hosted_storage"}], "disk_snapshots": [], "disks": [{"id": ["2f64edb7-4278-498f-b9fa-29b01e5bf9d4"], "image_id": "b55586fd-e237-459a-861b-670d75483902", "name": "HostedEngineConfigurationImage"}, {"id": ["3999cfa2-60d8-4b67-b3f9-549460b3d049"], "image_id": "a1ce92c9-f44c-4bf0-89b4-2f7415ef431a", "name": "he_virtio_disk"}, {"id": ["5f40d10d-80a8-4b2b-b5c7-b7ae96d17c9f"], "image_id": "b0d302ab-4f45-4e70-95a4-5777f7fa8efd", "name": "he_metadata"}, {"id": ["f4552177-ab8c-4fed-b706-ff0c56d37ed5"], "image_id": "f6ac0563-720c-4e65-af42-becb0ce9940c", "name": "he_sanlock"}], "external_status": "ok", "href": "/ovirt-engine/api/storagedomains/c88c6b61-1e45-44fd-b1fd-25c78a46defd", "id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "master": true, "name": "hosted_storage", "permissions": [{"id": ["58ca605c-010d-0307-0224-0000000001a9"]}, {"id": ["80f8b9b6-3344-11e8-8b64-00163e7bb853"]}], "storage": {"type": "iscsi", "volume_group": {"id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8", "logical_units": [{"address": "10.35.146.129", "discard_max_size": 8388608, "discard_zeroes_data": false, "id": "3514f0c5a516016ff", "lun_mapping": 1, "paths": 0, "port": 3260, "portal": "10.35.146.129:3260,1", "product_id": "XtremApp", "serial": "SXtremIO_XtremApp_XIO00153500071", "size": 75161927680, "storage_domain_id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "target": "iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00", "vendor_id": "XtremIO", "volume_group_id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8"}]}}, "storage_connections": [{"id": ["da632763-5747-4303-8124-c8b2060d2372"]}], "storage_format": "v4", "supports_discard": true, "supports_discard_zeroes_data": false, "templates": [], "type": "data", "used": 70866960384, "vms": [], "warning_low_space_indicator": 10, "wipe_after_delete": false}]}, "attempts": 12, "changed": false}

Tested lately on these components:
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
cockpit-ovirt-dashboard-0.11.19-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 16 Nikolai Sednev 2018-04-09 10:09:15 UTC

Works fine for Cockpit now too. This is what I'm getting in case of 70GB LUN on iSCSI storage:
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 64.0GiB of available space while a minimum of 68.0GiB is required"}

Moving to verified, as it works as expected on these components:
cockpit-ovirt-dashboard-0.11.20-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180404.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 19 errata-xmlrpc 2018-05-15 17:32:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1471

Comment 20 Franta Kust 2019-05-16 13:03:37 UTC

BZ<2>Jira Resync

Note You need to log in before you can comment on or make changes to this bug.