Bug 1522737

Summary:

Warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs

Product:

Red Hat Enterprise Virtualization Manager

Reporter:

Marian Jankular <mjankula>

Component:

ovirt-hosted-engine-setup

Assignee:

Simone Tiraboschi <stirabos>

Status:

CLOSED ERRATA

QA Contact:

Nikolai Sednev <nsednev>

Severity:

medium

Docs Contact:

Priority:

medium

Version:

4.1.6

CC:

bburmest, gveitmic, lsurette, mjankula, mkalinin, stirabos, ykaul, ylavi

Target Milestone:

ovirt-4.2.2

Keywords:

Triaged

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

ovirt-hosted-engine-setup-2.2.16-1.el7ev

Doc Type:

Bug Fix

Doc Text:

ovirt-hosted-engine-setup does not allow installation to proceed if there is not enough free space for required OVF_STORE disks to be created.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-05-15 17:32:28 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Integration

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

1549642, 1551289, 1551291, 1559328, 1561888

Bug Blocks:

1458709, 1538360

Attachments:

Description	Flags
engine log	none
sosreport from alma03	none

Description Marian Jankular 2017-12-06 10:45:49 UTC

Description of problem:
RHV we should at least warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs

Version-Release number of selected component (if applicable):


How reproducible:
always

Steps to Reproduce:
1. deploy hosted engine
2. use all the space from storage domain


Actual results:
it will allow user to use all the space, so later we are not able to create ovfs files for hosted engine storage domain so we are not able to manipulate with hosted engine vm (adding additional nic for example)

Expected results:
admin should be at least warned that if (s)he uses all the space it will have consequences in the future

Additional info:

Comment 1 Yaniv Kaul 2017-12-06 10:50:16 UTC

How exactly was all the storage consumed? By what? The HE VM disk?

Comment 3 Yaniv Kaul 2017-12-07 08:31:54 UTC

severity?

Comment 5 Sandro Bonazzola 2018-02-14 10:04:37 UTC

Simone any progress on this?

Comment 6 Nikolai Sednev 2018-03-21 14:57:38 UTC

Deployment on LUN with less space then required for SHE disk and its additional satelites fails as follows:

[ INFO  ] ok: [localhost]
          The following luns have been found on the requested target:
                [1]     3514f0c5a5160167b       54GiB   XtremIO XtremApp
                        status: free, paths: 1 active
         
          Please select the destination LUN (1) [1]: 
[ INFO  ] iSCSI discard after delete is enabled
[ INFO  ] Creating Storage Domain
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Obtain SSO token using username/password credentials]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch host facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch cluster id]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch cluster facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter id]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch datacenter_name]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Add nfs storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Add glusterfs storage domain]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Add iSCSI storage domain]
[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

I do not find this as proper warning for the customer though.
I thought that customer should be at least warned during deployment, that there is insufficient space on LUN, please extend or abort. 

Simone, can you please provide your input?

Comment 7 Simone Tiraboschi 2018-03-21 15:00:34 UTC

(In reply to Nikolai Sednev from comment #6)
> I do not find this as proper warning for the customer though.
> I thought that customer should be at least warned during deployment, that
> there is insufficient space on LUN, please extend or abort. 
> 
> Simone, can you please provide your input?

It failed before space check due to a different error, please check engine.log

Comment 8 Nikolai Sednev 2018-03-21 15:11:45 UTC

Attaching engine log and sosreport from host.

Comment 9 Nikolai Sednev 2018-03-21 15:12:12 UTC

Created attachment 1411249 [details]
engine log

Comment 10 Nikolai Sednev 2018-03-21 15:13:54 UTC

Created attachment 1411252 [details]
sosreport from alma03

Comment 11 Simone Tiraboschi 2018-03-21 15:18:18 UTC

This on engine.log:

2018-03-21 16:51:14,182+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem
2018-03-21 16:51:14,202+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand'.
2018-03-21 16:51:14,220+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-54) [] Operation Failed: []

Comment 12 Nikolai Sednev 2018-03-21 15:20:51 UTC

On host I do see normal connectivity to iSCSI storage:
alma03 ~]# multipath -ll
3514f0c5a5160167b dm-0 XtremIO ,XtremApp        
size=54G features='1 queue_if_no_path' hwhandler='0' wp=rw
`-+- policy='queue-length 0' prio=1 status=active
  `- 6:0:0:1 sdb 8:16 active ready running

Comment 13 Simone Tiraboschi 2018-03-21 15:24:57 UTC

And this on vdsm side:

2018-03-21 16:51:12,988+0200 INFO  (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'10.35.146.129', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:46)
2018-03-21 16:51:13,822+0200 ERROR (jsonrpc/4) [storage.HSM] Could not connect to storageServer (hsm:2398)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2395, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 487, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode
    iscsiadm.node_login(iface.name, target.address, target.iqn)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 337, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (19, ['Logging in to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260].', 'iscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure)', 'iscsiadm: Could not log into all portals'])
2018-03-21 16:51:14,144+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 465, 'id': u'00000000-0000-0000-0000-000000000000'}]} from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:52)
2018-03-21 16:51:14,145+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 1.16 seconds (__init__:573)

Comment 14 Nikolai Sednev 2018-03-26 14:33:18 UTC

[ INFO  ] ok: [localhost]
          The following luns have been found on the requested target:
                [1]     3514f0c5a516016d7       60GiB   XtremIO XtremApp
                        status: free, paths: 1 active

[ INFO  ] TASK [Check storage domain free space]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 55296.0Mb of available space while a minimum of 56320.0Mb is required"}
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: 

Works for me on these components:
rhvm-appliance-4.2-20180202.0.el7.noarch
ovirt-hosted-engine-setup-2.2.14-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 15 Nikolai Sednev 2018-03-29 14:30:31 UTC

Reopening due to https://bugzilla.redhat.com/show_bug.cgi?id=1562019#c14.
In CLI and Cockpit, minimum LUN sizes are different, hence for CLI 70GB LUN is sufficient, while for Cockpit its not.
Cockpit is using 58GB as minimum default requirement.
CLI is using 50GB as minimum default requirement.

Need to check properly in Cockpit for minimum required LUN's space, prior to deployment and report correctly for insufficient space instead of providing unreadable error report like this:
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_storage_domains": [{"available": 3221225472, "backup": false, "committed": 65498251264, "critical_space_action_blocker": 5, "data_centers": [{"href": "/ovirt-engine/api/datacenters/4c78136c-3344-11e8-a34a-00163e7bb853", "id": "4c78136c-3344-11e8-a34a-00163e7bb853"}], "discard_after_delete": false, "disk_profiles": [{"id": ["72088a9a-edb2-4111-92cb-b8f4cc0f9ab4"], "name": "hosted_storage"}], "disk_snapshots": [], "disks": [{"id": ["2f64edb7-4278-498f-b9fa-29b01e5bf9d4"], "image_id": "b55586fd-e237-459a-861b-670d75483902", "name": "HostedEngineConfigurationImage"}, {"id": ["3999cfa2-60d8-4b67-b3f9-549460b3d049"], "image_id": "a1ce92c9-f44c-4bf0-89b4-2f7415ef431a", "name": "he_virtio_disk"}, {"id": ["5f40d10d-80a8-4b2b-b5c7-b7ae96d17c9f"], "image_id": "b0d302ab-4f45-4e70-95a4-5777f7fa8efd", "name": "he_metadata"}, {"id": ["f4552177-ab8c-4fed-b706-ff0c56d37ed5"], "image_id": "f6ac0563-720c-4e65-af42-becb0ce9940c", "name": "he_sanlock"}], "external_status": "ok", "href": "/ovirt-engine/api/storagedomains/c88c6b61-1e45-44fd-b1fd-25c78a46defd", "id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "master": true, "name": "hosted_storage", "permissions": [{"id": ["58ca605c-010d-0307-0224-0000000001a9"]}, {"id": ["80f8b9b6-3344-11e8-8b64-00163e7bb853"]}], "storage": {"type": "iscsi", "volume_group": {"id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8", "logical_units": [{"address": "10.35.146.129", "discard_max_size": 8388608, "discard_zeroes_data": false, "id": "3514f0c5a516016ff", "lun_mapping": 1, "paths": 0, "port": 3260, "portal": "10.35.146.129:3260,1", "product_id": "XtremApp", "serial": "SXtremIO_XtremApp_XIO00153500071", "size": 75161927680, "storage_domain_id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "target": "iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00", "vendor_id": "XtremIO", "volume_group_id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8"}]}}, "storage_connections": [{"id": ["da632763-5747-4303-8124-c8b2060d2372"]}], "storage_format": "v4", "supports_discard": true, "supports_discard_zeroes_data": false, "templates": [], "type": "data", "used": 70866960384, "vms": [], "warning_low_space_indicator": 10, "wipe_after_delete": false}]}, "attempts": 12, "changed": false}

Tested lately on these components:
ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch
cockpit-ovirt-dashboard-0.11.19-1.el7ev.noarch
rhvm-appliance-4.2-20180202.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 16 Nikolai Sednev 2018-04-09 10:09:15 UTC

Works fine for Cockpit now too. This is what I'm getting in case of 70GB LUN on iSCSI storage:
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 64.0GiB of available space while a minimum of 68.0GiB is required"}

Moving to verified, as it works as expected on these components:
cockpit-ovirt-dashboard-0.11.20-1.el7ev.noarch
ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch
rhvm-appliance-4.2-20180404.0.el7.noarch
Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

Comment 19 errata-xmlrpc 2018-05-15 17:32:28 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1471

Comment 20 Franta Kust 2019-05-16 13:03:37 UTC

BZ<2>Jira Resync