Description of problem: RHV we should at least warn admin during hosted engine deploy not to use all the space from hosted SD and leave something for OVFs Version-Release number of selected component (if applicable): How reproducible: always Steps to Reproduce: 1. deploy hosted engine 2. use all the space from storage domain Actual results: it will allow user to use all the space, so later we are not able to create ovfs files for hosted engine storage domain so we are not able to manipulate with hosted engine vm (adding additional nic for example) Expected results: admin should be at least warned that if (s)he uses all the space it will have consequences in the future Additional info:
How exactly was all the storage consumed? By what? The HE VM disk?
severity?
Simone any progress on this?
Deployment on LUN with less space then required for SHE disk and its additional satelites fails as follows: [ INFO ] ok: [localhost] The following luns have been found on the requested target: [1] 3514f0c5a5160167b 54GiB XtremIO XtremApp status: free, paths: 1 active Please select the destination LUN (1) [1]: [ INFO ] iSCSI discard after delete is enabled [ INFO ] Creating Storage Domain [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Obtain SSO token using username/password credentials] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch host facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch cluster id] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch cluster facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch datacenter facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch datacenter id] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch datacenter_name] [ INFO ] ok: [localhost] [ INFO ] TASK [Add nfs storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [Add glusterfs storage domain] [ INFO ] skipping: [localhost] [ INFO ] TASK [Add iSCSI storage domain] [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400. [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."} Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: I do not find this as proper warning for the customer though. I thought that customer should be at least warned during deployment, that there is insufficient space on LUN, please extend or abort. Simone, can you please provide your input?
(In reply to Nikolai Sednev from comment #6) > I do not find this as proper warning for the customer though. > I thought that customer should be at least warned during deployment, that > there is insufficient space on LUN, please extend or abort. > > Simone, can you please provide your input? It failed before space check due to a different error, please check engine.log
Attaching engine log and sosreport from host.
Created attachment 1411249 [details] engine log
Created attachment 1411252 [details] sosreport from alma03
This on engine.log: 2018-03-21 16:51:14,182+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem 2018-03-21 16:51:14,202+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand'. 2018-03-21 16:51:14,220+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-54) [] Operation Failed: []
On host I do see normal connectivity to iSCSI storage: alma03 ~]# multipath -ll 3514f0c5a5160167b dm-0 XtremIO ,XtremApp size=54G features='1 queue_if_no_path' hwhandler='0' wp=rw `-+- policy='queue-length 0' prio=1 status=active `- 6:0:0:1 sdb 8:16 active ready running
And this on vdsm side: 2018-03-21 16:51:12,988+0200 INFO (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'10.35.146.129', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:46) 2018-03-21 16:51:13,822+0200 ERROR (jsonrpc/4) [storage.HSM] Could not connect to storageServer (hsm:2398) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2395, in connectStorageServer conObj.connect() File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 487, in connect iscsi.addIscsiNode(self._iface, self._target, self._cred) File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode iscsiadm.node_login(iface.name, target.address, target.iqn) File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 337, in node_login raise IscsiNodeError(rc, out, err) IscsiNodeError: (19, ['Logging in to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260].', 'iscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure)', 'iscsiadm: Could not log into all portals']) 2018-03-21 16:51:14,144+0200 INFO (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 465, 'id': u'00000000-0000-0000-0000-000000000000'}]} from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:52) 2018-03-21 16:51:14,145+0200 INFO (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 1.16 seconds (__init__:573)
[ INFO ] ok: [localhost] The following luns have been found on the requested target: [1] 3514f0c5a516016d7 60GiB XtremIO XtremApp status: free, paths: 1 active [ INFO ] TASK [Check storage domain free space] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 55296.0Mb of available space while a minimum of 56320.0Mb is required"} Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs)[nfs]: Works for me on these components: rhvm-appliance-4.2-20180202.0.el7.noarch ovirt-hosted-engine-setup-2.2.14-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.7-1.el7ev.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo)
Reopening due to https://bugzilla.redhat.com/show_bug.cgi?id=1562019#c14. In CLI and Cockpit, minimum LUN sizes are different, hence for CLI 70GB LUN is sufficient, while for Cockpit its not. Cockpit is using 58GB as minimum default requirement. CLI is using 50GB as minimum default requirement. Need to check properly in Cockpit for minimum required LUN's space, prior to deployment and report correctly for insufficient space instead of providing unreadable error report like this: [ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_storage_domains": [{"available": 3221225472, "backup": false, "committed": 65498251264, "critical_space_action_blocker": 5, "data_centers": [{"href": "/ovirt-engine/api/datacenters/4c78136c-3344-11e8-a34a-00163e7bb853", "id": "4c78136c-3344-11e8-a34a-00163e7bb853"}], "discard_after_delete": false, "disk_profiles": [{"id": ["72088a9a-edb2-4111-92cb-b8f4cc0f9ab4"], "name": "hosted_storage"}], "disk_snapshots": [], "disks": [{"id": ["2f64edb7-4278-498f-b9fa-29b01e5bf9d4"], "image_id": "b55586fd-e237-459a-861b-670d75483902", "name": "HostedEngineConfigurationImage"}, {"id": ["3999cfa2-60d8-4b67-b3f9-549460b3d049"], "image_id": "a1ce92c9-f44c-4bf0-89b4-2f7415ef431a", "name": "he_virtio_disk"}, {"id": ["5f40d10d-80a8-4b2b-b5c7-b7ae96d17c9f"], "image_id": "b0d302ab-4f45-4e70-95a4-5777f7fa8efd", "name": "he_metadata"}, {"id": ["f4552177-ab8c-4fed-b706-ff0c56d37ed5"], "image_id": "f6ac0563-720c-4e65-af42-becb0ce9940c", "name": "he_sanlock"}], "external_status": "ok", "href": "/ovirt-engine/api/storagedomains/c88c6b61-1e45-44fd-b1fd-25c78a46defd", "id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "master": true, "name": "hosted_storage", "permissions": [{"id": ["58ca605c-010d-0307-0224-0000000001a9"]}, {"id": ["80f8b9b6-3344-11e8-8b64-00163e7bb853"]}], "storage": {"type": "iscsi", "volume_group": {"id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8", "logical_units": [{"address": "10.35.146.129", "discard_max_size": 8388608, "discard_zeroes_data": false, "id": "3514f0c5a516016ff", "lun_mapping": 1, "paths": 0, "port": 3260, "portal": "10.35.146.129:3260,1", "product_id": "XtremApp", "serial": "SXtremIO_XtremApp_XIO00153500071", "size": 75161927680, "storage_domain_id": "c88c6b61-1e45-44fd-b1fd-25c78a46defd", "target": "iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c00", "vendor_id": "XtremIO", "volume_group_id": "Blzr2A-845n-WNnx-M0HD-s2gh-L3L0-wNRsj8"}]}}, "storage_connections": [{"id": ["da632763-5747-4303-8124-c8b2060d2372"]}], "storage_format": "v4", "supports_discard": true, "supports_discard_zeroes_data": false, "templates": [], "type": "data", "used": 70866960384, "vms": [], "warning_low_space_indicator": 10, "wipe_after_delete": false}]}, "attempts": 12, "changed": false} Tested lately on these components: ovirt-hosted-engine-ha-2.2.9-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.15-1.el7ev.noarch cockpit-ovirt-dashboard-0.11.19-1.el7ev.noarch rhvm-appliance-4.2-20180202.0.el7.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo)
Works fine for Cockpit now too. This is what I'm getting in case of 70GB LUN on iSCSI storage: [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Error: the target storage domain contains only 64.0GiB of available space while a minimum of 68.0GiB is required"} Moving to verified, as it works as expected on these components: cockpit-ovirt-dashboard-0.11.20-1.el7ev.noarch ovirt-hosted-engine-setup-2.2.16-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.10-1.el7ev.noarch rhvm-appliance-4.2-20180404.0.el7.noarch Linux 3.10.0-862.el7.x86_64 #1 SMP Wed Mar 21 18:14:51 EDT 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo)
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:1471
BZ<2>Jira Resync