Bug 1522737 - Warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs
Warn admin during hosted engine deploy not to use all the space from hosted S...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup (Show other bugs)
4.1.6
Unspecified Unspecified
medium Severity medium
: ovirt-4.2.2
: ---
Assigned To: Simone Tiraboschi
Nikolai Sednev
: Triaged
Depends On: 1549642 1551289 1551291 1559328 1561888
Blocks: 1538360 1458709
  Show dependency treegraph
 
Reported: 2017-12-06 05:45 EST by Marian Jankular
Modified: 2018-07-16 08:51 EDT (History)
9 users (show)

See Also:
Fixed In Version: ovirt-hosted-engine-setup-2.2.16-1.el7ev
Doc Type: Bug Fix
Doc Text:
ovirt-hosted-engine-setup does not allow installation to proceed if there is not enough free space for required OVF_STORE disks to be created.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-05-15 13:32:28 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
engine log (137.05 KB, text/plain)
2018-03-21 11:12 EDT, Nikolai Sednev
no flags Details
sosreport from alma03 (9.75 MB, application/x-xz)
2018-03-21 11:13 EDT, Nikolai Sednev
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 85260 ovirt-hosted-engine-setup-2.2 MERGED storage: take care of OVF_STORE disks size 2018-03-01 09:23 EST
oVirt gerrit 88353 ovirt-hosted-engine-setup-2.2 MERGED storage: take care of OVF_STORE disks size 2018-03-01 09:34 EST
oVirt gerrit 89366 ovirt-hosted-engine-setup-2.2 MERGED Change the order of disk creation 2018-04-05 03:49 EDT
oVirt gerrit 89614 ovirt-hosted-engine-setup-2.2 MERGED Reserve space for CRITICAL_SPACE_ACTION_BLOCKER 2018-03-29 11:49 EDT
oVirt gerrit 89616 ovirt-hosted-engine-setup-2.2 MERGED Reserve space for CRITICAL_SPACE_ACTION_BLOCKER 2018-03-29 12:05 EDT
oVirt gerrit 89854 ovirt-hosted-engine-setup-2.2 MERGED Change the order of disk creation 2018-04-05 04:01 EDT
Red Hat Product Errata RHBA-2018:1471 None None None 2018-05-15 13:33 EDT

  None (edit)
Description Marian Jankular 2017-12-06 05:45:49 EST
Description of problem:
RHV we should at least warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. deploy hosted engine
2. use all the space from storage domain


Actual results:
it will allow user to use all the space, so later we are not able to create ovfs files for hosted engine storage domain so we are not able to manipulate with hosted engine vm (adding additional nic for example)

Expected results:
admin should be at least warned that if (s)he uses all the space it will have consequences in the future

Additional info:
Comment 1 Yaniv Kaul 2017-12-06 05:50:16 EST
How exactly was all the storage consumed? By what? The HE VM disk?
Comment 3 Yaniv Kaul 2017-12-07 03:31:54 EST
severity?
Comment 5 Sandro Bonazzola 2018-02-14 05:04:37 EST
Simone any progress on this?
Comment 6 Nikolai Sednev 2018-03-21 10:57:38 EDT
Deployment on LUN with less space then required for SHE disk and its additional satelites fails as follows:

[ INFO  ] ok: [localhost]
          The following luns have been found on the requested target:
                [1]     3514f0c5a5160167b       54GiB   XtremIO XtremApp
                        status: free, paths: 1 active
         
          Please select the destination LUN (1) [1]: 
[ INFO  ] iSCSI discard after delete is enabled
[ INFO  ] Creating Storage Domain
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch host facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch cluster id]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch cluster facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter id]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter_name]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Add nfs storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Add glusterfs storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Add iSCSI storage domain]
[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

I do not find this as proper warning for the customer though.
I thought that customer should be at least warned during deployment, that there is insufficient space on LUN, please extend or abort. 

Simone, can you please provide your input?
Comment 7 Simone Tiraboschi 2018-03-21 11:00:34 EDT
(In reply to Nikolai Sednev from comment #6)
> I do not find this as proper warning for the customer though.
> I thought that customer should be at least warned during deployment, that
> there is insufficient space on LUN, please extend or abort. 
> 
> Simone, can you please provide your input?

It failed before space check due to a different error, please check engine.log
Comment 8 Nikolai Sednev 2018-03-21 11:11:45 EDT
Attaching engine log and sosreport from host.
Comment 9 Nikolai Sednev 2018-03-21 11:12 EDT
Created attachment 1411249 [details]
engine log
Comment 10 Nikolai Sednev 2018-03-21 11:13 EDT
Created attachment 1411252 [details]
sosreport from alma03
Comment 11 Simone Tiraboschi 2018-03-21 11:18:18 EDT
This on engine.log:

2018-03-21 16:51:14,182+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem
2018-03-21 16:51:14,202+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand'.
2018-03-21 16:51:14,220+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-54) [] Operation Failed: []
Comment 12 Nikolai Sednev 2018-03-21 11:20:51 EDT
On host I do see normal connectivity to iSCSI storage:
alma03 ~]# multipath -ll
3514f0c5a5160167b dm-0 XtremIO ,XtremApp        
size=54G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  `- 6:0:0:1 sdb 8:16 active ready running
Comment 13 Simone Tiraboschi 2018-03-21 11:24:57 EDT
And this on vdsm side:

2018-03-21 16:51:12,988+0200 INFO  (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'10.35.146.129', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:46)
2018-03-21 16:51:13,822+0200 ERROR (jsonrpc/4) [storage.HSM] Could not connect to storageServer (hsm:2398)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2395, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 487, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode
    iscsiadm.node_login(iface.name, target.address, target.iqn)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 337, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (19, ['Logging in to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260].', 'iscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure)', 'iscsiadm: Could not log into all portals'])
2018-03-21 16:51:14,144+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 465, 'id': u'00000000-0000-0000-0000-000000000000'}]} from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:52)
2018-03-21 16:51:14,145+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 1.16 seconds (__init__:573)
Comment 14 Nikolai Sednev 2018-03-26 10:33:18 EDT
[ INFO  ] ok: [localhost]
          The following luns have been found on the requested target:
                [1]     3514f0c5a516016d7       60GiB   XtremIO XtremApp
                        status: free, paths: 1 active

[ INFO  ] TASK [Check storage domain free space]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 55296.0Mb of available space while a minimum of 56320.0Mb is required"}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

Works for me on these components:
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-setup-2.2.14-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Comment 15 Nikolai Sednev 2018-03-29 10:30:31 EDT
Reopening due to https://bugzilla.redhat.com/show_bug.cgi?id=1562019#c14.
In CLI and Cockpit, minimum LUN sizes are different, hence for CLI 70GB LUN is sufficient, while for Cockpit its not.
Cockpit is using 58GB as minimum default requirement.
CLI is using 50GB as minimum default requirement.

Need to check properly in Cockpit for minimum required LUN's space, prior to deployment and report correctly for insufficient space instead of providing unreadable error report like this:
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_storage_domains": [{"available": 3221225472, "backup": false, "committed": 65498251264, "critical_space_action_blocker": 5, "data_centers": [{"href": "/ovirt-engine/api/datacenters/4c78136c-3344-11e8-a34a-00163e7bb853", "id": "4c78136c-3344-11e8-a34a-00163e7bb853"}], "discard_after_delete": false, "disk_profiles": [{"id": ["72088a9a-edb2-4111-92cb-b8f4cc0f9ab4"], "name": "hosted_storage"}], "disk_snapshots": [], "disks": [{"id": ["2f64edb7-4278-498f-b9fa-29b01e5bf9d4"], "image_id": "b55586fd-e237-459a-861b-670d75483902", "name": "HostedEngineConfigurationImage"}, {"id": ["3999cfa2-60d8-4b67-b3f9-549460b3d049"], "image_id": "a1ce92c9-f44c-4bf0-89b4-2f7415ef431a", "name": "he_virtio_disk"}, {"id": ["5f40d10d-80a8-4b2b-b5c7-b7ae96d17c9f"], "image_id": "b0d302ab-4f45-4e70-95a4-5777f7fa8efd", "name": "he_metadata"}, {"id": ["f4552177-ab8c-4fed-b706-ff0c56d37ed5"], "image_id": "f6ac0563-720c-4e65-af42-becb0ce9940c", "name": "he_sanlock"}], "external_status": "ok", "href": "/ovirt-engine/api/storagedomains/c88c6b61-1e45-44fd-b1fd-25c78a46defd", "id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "master": true, "name": "hosted_storage", "permissions": [{"id": ["58ca605c-010d-0307-0224-0000000001a9"]}, {"id": ["80f8b9b6-3344-11e8-8b64-00163e7bb853"]}], "storage": {"type": "iscsi", "volume_group": {"id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8", "logical_units": [{"address": "10.35.146.129", "discard_max_size": 8388608, "discard_zeroes_data": false, "id": "3514f0c5a516016ff", "lun_mapping": 1, "paths": 0, "port": 3260, "portal": "10.35.146.129:3260,1", "product_id": "XtremApp", "serial": "SXtremIO_XtremApp_XIO00153500071", "size": 75161927680, "storage_domain_id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "target": "iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00", "vendor_id": "XtremIO", "volume_group_id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8"}]}}, "storage_connections": [{"id": ["da632763-5747-4303-8124-c8b2060d2372"]}], "storage_format": "v4", "supports_discard": true, "supports_discard_zeroes_data": false, "templates": [], "type": "data", "used": 70866960384, "vms": [], "warning_low_space_indicator": 10, "wipe_after_delete": false}]}, "attempts": 12, "changed": false}

Tested lately on these components:
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
cockpit-ovirt-dashboard-0.11.19-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Comment 16 Nikolai Sednev 2018-04-09 06:09:15 EDT
Works fine for Cockpit now too. This is what I'm getting in case of 70GB LUN on iSCSI storage:
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 64.0GiB of available space while a minimum of 68.0GiB is required"}

Moving to verified, as it works as expected on these components:
cockpit-ovirt-dashboard-0.11.20-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180404.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)
Comment 19 errata-xmlrpc 2018-05-15 13:32:28 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1471

Note You need to log in before you can comment on or make changes to this bug.