Bug 1554922 - Failures creating a storage domain via ansible module/REST API doesn't report a meaningful error message
Summary: Failures creating a storage domain via ansible module/REST API doesn't report...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: RestAPI
Version: 4.2.2
Hardware: Unspecified
OS: Unspecified
unspecified
medium with 1 vote
Target Milestone: ovirt-4.3.0
: 4.3.0
Assignee: shani
QA Contact: Avihai
URL:
Whiteboard:
: 1559811 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-03-13 15:05 UTC by Simone Tiraboschi
Modified: 2019-02-13 07:45 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.3.0_alpha
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-02-13 07:45:04 UTC
oVirt Team: Storage
Embargoed:
rule-engine: ovirt-4.3+


Attachments (Terms of Use)
Hosted Engine Log file for Storage Domain Failure issue. (298.98 KB, text/plain)
2018-03-19 16:59 UTC, Steven Rosenberg
no flags Details
Ansible log for Storage Domain failure. (575.33 KB, text/plain)
2018-03-19 17:04 UTC, Steven Rosenberg
no flags Details
engine.log (137.05 KB, text/plain)
2018-03-21 15:51 UTC, Simone Tiraboschi
no flags Details
sosreport from the host (9.75 MB, application/x-xz)
2018-03-21 15:52 UTC, Simone Tiraboschi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1559811 0 unspecified CLOSED Deploy HE failed on the [Add NFS storage domain] task while selecting NFS v4_1 2021-02-22 00:41:40 UTC
oVirt gerrit 93279 0 'None' MERGED core: Setting error message for failed ConnectStorageToVdsCommand 2021-01-29 10:09:00 UTC

Internal Links: 1559811

Description Simone Tiraboschi 2018-03-13 15:05:34 UTC
Description of problem:
Example:

{"msg": "All items completed", "changed": false, "results": [{"_ansible_parsed": true, "exception": "Traceback (most recent call last):\n  File \"/tmp/ansible_b0HbRy/ansible_module_ovirt_hosts.py\", line 524, in main\n    target=module.params['iscsi']['target'],\n  File \"/tmp/ansible_b0HbRy/ansible_modlib.zip/ansible/module_utils/ovirt.py\", line 734, in action\n    getattr(entity_service, action)(**kwargs)\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/services.py\", line 36562, in iscsi_login\n    return self._internal_action(action, 'iscsilogin', None, headers, query, wait)\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 299, in _internal_action\n    return future.wait() if wait else future\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 55, in wait\n    return self._code(response)\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 296, in callback\n    self._check_fault(response)\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 134, in _check_fault\n    self._raise_error(response, body.fault)\n  File \"/usr/lib64/python2.7/site-packages/ovirtsdk4/service.py\", line 118, in _raise_error\n    raise error\nError: Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400.\n", "_ansible_item_result": true, "_ansible_no_log": false, "item": ["192.168.1.22", "3260"], "changed": false, "failed": true, "invocation": {"module_args": {"comment": null, "activate": true, "force": false, "power_management_enabled": null, "cluster": null, "fetch_nested": false, "hosted_engine": null, "check_upgrade": true, "password": null, "iscsi": {"username": null, "password": null, "target": "iqn.2017-08.localdomain.mini2:target", "address": "192.168.1.22", "port": "3260"}, "state": "iscsilogin", "nested_attributes": [], "address": null, "override_iptables": null, "kdump_integration": null, "wait": true, "public_key": false, "name": "m1-h6.localdomain", "spm_priority": null, "poll_interval": 3, "kernel_params": null, "timeout": 30, "override_display": null}}, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}]}

while in engine.log we can see:
2018-03-13 09:04:58,817-04 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-9) [74f41c83-3e11-49b6-9439-ccc783930f3e] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '476' and error message is: failed to login to iscsi node due to authorization failure



Version-Release number of selected component (if applicable):
4.2.2 beta

How reproducible:
100%

Steps to Reproduce:
1. try creating an iSCSI storage domain via ansible passing wrong credentials
2.
3.

Actual results:
"msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400." got reported back to the caller and "failed to login to iscsi node due to authorization failure" is visible just in engine.log


Expected results:
Something like:
"msg": "error '476': failed to login to iscsi node due to authorization failure"
in the response

Additional info:
something similar happens also on NFS on permission errors.

Comment 1 Allon Mureinik 2018-03-15 15:24:15 UTC
Simone, can you please attach engine and server logs?
Thanks!

Comment 2 Steven Rosenberg 2018-03-19 16:59:09 UTC
Created attachment 1410032 [details]
Hosted Engine Log file for Storage Domain Failure issue.

Comment 3 Steven Rosenberg 2018-03-19 17:00:31 UTC
As per Simone Tiraboschi's request I am adding another HTTP 400 error to this ticket. The failure here is when attempting to set a storage domain when setting up the hosted engine. The storage domain was already used by the engine.

The error in the hosted engine log file is here:

2018-03-19 12:43:59,256+0200 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400.
2018-03-19 12:43:59,357+0200 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[]\". HTTP response code is 400."}


The error in the hosted engine ansible log is here:

2018-03-19 14:03:02,820+0200 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Add nfs storage domain', 'ansible_result': u'type: <type \'dict\'>\nstr: {\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n  File "/tmp/ansible_FIgFc6/ansible_module_ovirt_storage_domains.py", line 542, in main\\n    storage_domains_module.post_create_check(sd_id)\\n  File "/tmp/ansible_FIgFc6/ansible_module_ovirt_storage_domains.py", line 425, \nrepr: {\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n  File "/tmp/ansible_FIgFc6/ansible_module_ovirt_storage_domains.py", line 542, in main\\n    storage_domains_module.post_create_check(sd_id)\\n  File "/tmp/ansible_FIgFc6/ansible_module_ovirt_storage_domains.py", line 425, \ndir: [\'__class__\', \'__cmp__\', \'__contains__\', \'__delattr__\', \'__delitem__\', \'__doc__\', \'__eq__\', \'__format__\', \'__ge__\', \'__getattribute__\', \'__getitem__\', \'__gt__\', \'__hash__\', \'__init__\', \'__iter__\', \'__le__\', \'__len__\', \'__lt__\', \'__ne__\', \'__new__\', \'__reduce__\', \'__reduce_ex__\', \'__repr__\', \'__setattr__\', \'__setitem__\', \'__sizeof__\', \'__str__\', \'__subclasshook__\', \'clear\', \'copy\', \'fromkeys\', \'get\', \'has_key\', \'items\', \'iteritems\', \'iterkeys\', \'itervalues\', \'keys\', \'pop\', \'popitem\', \'setdefault\', \'update\', \'values\', \'viewitems\', \'viewkeys\', \'viewvalues\']\npprint: {\'_ansible_no_log\': False,\n \'_ansible_parsed\': True,\n \'changed\': False,\n u\'exception\': u\'Traceback (most recent call last):\\n  File "/tmp/ansible_FIgFc6/ansible_module_ovirt_storage_domains.py", line 542, in main\\n    storage_domains_module.post_create_check(sd_id)\\n  File "/tmp/ansible_FIgFc6/ansib\n{\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n  File "/tmp/ansible_F.__doc__: "dict() -> new empty dictionary\\ndict(mapping) -> new dictionary initialized from a mapping object\'s\\n    (key, value) pairs\\ndict(iterable) -> new dictionary initialized as if via:\\n    d = {}\\n    for k, v in iterable:\\n        d[k] = v\\ndict(**kwargs) -> new dictionary initialized with the name=value pairs\\n    in the keyword argument list.  For example:  dict(one=1, two=2)"\n{\'_ansible_parsed\': True, u\'exception\': u\'Traceback (most recent call last):\\n  File "/tmp/ansible_F.__hash__: None', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml'}
2018-03-19 14:03:02,820+0200 DEBUG ansible on_any args <ansible.executor.task_result.TaskResult object at 0x7f4b5807f710> kwargs ignore_errors:None

Comment 4 Simone Tiraboschi 2018-03-19 17:04:10 UTC
(In reply to Steven Rosenberg from comment #3)
> The error in the hosted engine ansible log is here:

Sorry, I was meaning engine.log from the engine VM (at that stage you could access it via ssh only from the host where you run hosted-engine-setup) and vdsm.log from your host.
Do you still have them?

Comment 5 Steven Rosenberg 2018-03-19 17:04:27 UTC
Created attachment 1410033 [details]
Ansible log for Storage Domain failure.

Comment 6 Simone Tiraboschi 2018-03-21 15:51:40 UTC
Created attachment 1411271 [details]
engine.log

Comment 7 Simone Tiraboschi 2018-03-21 15:52:41 UTC
Created attachment 1411272 [details]
sosreport from the host

Comment 8 Simone Tiraboschi 2018-03-21 15:56:40 UTC
OK, check my latest attachment for instance.

We just got 

 [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[]". HTTP response code is 400.

from the REST API.

But on engine.log we see:

2018-03-21 16:51:14,182+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ISCSIStorageHelper] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] The connection with details '00000000-0000-0000-0000-000000000000' failed because of error code '465' and error message is: failed to setup iscsi subsystem
2018-03-21 16:51:14,202+02 ERROR [org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand] (default task-54) [90ed7229-03fa-4c57-a29b-0dbc160ea770] Transaction rolled-back for command 'org.ovirt.engine.core.bll.storage.connection.ConnectStorageToVdsCommand'.
2018-03-21 16:51:14,220+02 ERROR [org.ovirt.engine.api.restapi.resource.AbstractBackendResource] (default task-54) [] Operation Failed: []

And on vdsm side we see:

2018-03-21 16:51:12,988+0200 INFO  (jsonrpc/4) [vdsm.api] START connectStorageServer(domType=3, spUUID=u'00000000-0000-0000-0000-000000000000', conList=[{u'id': u'00000000-0000-0000-0000-000000000000', u'connection': u'10.35.146.129', u'iqn': u'iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05', u'user': u'', u'tpgt': u'1', u'password': '********', u'port': u'3260'}], options=None) from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:46)
2018-03-21 16:51:13,822+0200 ERROR (jsonrpc/4) [storage.HSM] Could not connect to storageServer (hsm:2398)
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/storage/hsm.py", line 2395, in connectStorageServer
    conObj.connect()
  File "/usr/lib/python2.7/site-packages/vdsm/storage/storageServer.py", line 487, in connect
    iscsi.addIscsiNode(self._iface, self._target, self._cred)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsi.py", line 217, in addIscsiNode
    iscsiadm.node_login(iface.name, target.address, target.iqn)
  File "/usr/lib/python2.7/site-packages/vdsm/storage/iscsiadm.py", line 337, in node_login
    raise IscsiNodeError(rc, out, err)
IscsiNodeError: (19, ['Logging in to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260] (multiple)'], ['iscsiadm: Could not login to [iface: default, target: iqn.2008-05.com.xtremio:xio00153500071-514f0c50023f6c05, portal: 10.35.146.129,3260].', 'iscsiadm: initiator reported error (19 - encountered non-retryable iSCSI login failure)', 'iscsiadm: Could not log into all portals'])
2018-03-21 16:51:14,144+0200 INFO  (jsonrpc/4) [vdsm.api] FINISH connectStorageServer return={'statuslist': [{'status': 465, 'id': u'00000000-0000-0000-0000-000000000000'}]} from=::ffff:192.168.122.168,44180, flow_id=90ed7229-03fa-4c57-a29b-0dbc160ea770, task_id=dd324d94-6d51-4d6f-8cc2-105d5ecbd504 (api:52)
2018-03-21 16:51:14,145+0200 INFO  (jsonrpc/4) [jsonrpc.JsonRpcServer] RPC call StoragePool.connectStorageServer succeeded in 1.16 seconds (__init__:573)

Comment 9 Simone Tiraboschi 2018-03-23 14:13:06 UTC
See also https://bugzilla.redhat.com/show_bug.cgi?id=1559811#c4
for another reference

Comment 10 Simone Tiraboschi 2018-04-09 09:12:00 UTC
*** Bug 1559811 has been marked as a duplicate of this bug. ***

Comment 11 Michael Burman 2018-05-15 07:18:43 UTC
I face the same issue on latest 4.2.3.5(GA) with nfs storage type. 

I'm running hosted-engine deploy via cockpit wizard with latest rhvm-appliance and when trying to finish the deployment and mount the storage i get:

[ INFO ] TASK [Add NFS storage domain]
[ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is "[Problem while trying to mount target]". HTTP response code is 400.
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Problem while trying to mount target]\". HTTP response code is 400."}

In the ovirt-hosted-engine-setup-ansible-create_storage_domain log i see:

2018-05-15 10:03:08,715+0300 INFO ansible start playbook /usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml
2018-05-15 10:03:08,715+0300 DEBUG ansible start {'status': 'OK', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml', 'ansible_type': 'start'}
2018-05-15 10:03:08,715+0300 DEBUG ansible on_any args <ansible.playbook.Playbook object at 0x7f8a9cb68d90> kwargs 
2018-05-15 10:03:08,776+0300 INFO ansible play start {'status': 'OK', 'ansible_play': u'Create hosted engine local vm', 'ansible_type': 'play start'}
2018-05-15 10:03:08,776+0300 DEBUG ansible on_any args Create hosted engine local vm kwargs

IS there any workaround for this? 

cockpit-165
rhvh-4.2.3.0-0.20180508.0+1
rhvm-appliance-4.2-20180504.0.el7.noarch

Comment 12 Michael Burman 2018-05-15 07:34:25 UTC
(In reply to Michael Burman from comment #11)
> I face the same issue on latest 4.2.3.5(GA) with nfs storage type. 
> 
> I'm running hosted-engine deploy via cockpit wizard with latest
> rhvm-appliance and when trying to finish the deployment and mount the
> storage i get:
> 
> [ INFO ] TASK [Add NFS storage domain]
> [ ERROR ] Error: Fault reason is "Operation Failed". Fault detail is
> "[Problem while trying to mount target]". HTTP response code is 400.
> [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Fault
> reason is \"Operation Failed\". Fault detail is \"[Problem while trying to
> mount target]\". HTTP response code is 400."}
> 
> In the ovirt-hosted-engine-setup-ansible-create_storage_domain log i see:
> 
> 2018-05-15 10:03:08,715+0300 INFO ansible start playbook
> /usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml
> 2018-05-15 10:03:08,715+0300 DEBUG ansible start {'status': 'OK',
> 'ansible_playbook':
> u'/usr/share/ovirt-hosted-engine-setup/ansible/create_storage_domain.yml',
> 'ansible_type': 'start'}
> 2018-05-15 10:03:08,715+0300 DEBUG ansible on_any args
> <ansible.playbook.Playbook object at 0x7f8a9cb68d90> kwargs 
> 2018-05-15 10:03:08,776+0300 INFO ansible play start {'status': 'OK',
> 'ansible_play': u'Create hosted engine local vm', 'ansible_type': 'play
> start'}
> 2018-05-15 10:03:08,776+0300 DEBUG ansible on_any args Create hosted engine
> local vm kwargs
> 
> IS there any workaround for this? 
> 
> cockpit-165
> rhvh-4.2.3.0-0.20180508.0+1
> rhvm-appliance-4.2-20180504.0.el7.noarch

Possibly related to our local DNS issue on the site, vdsm log:
MountError: (32, ';mount.nfs: Failed to resolve server yellow-vdsb.qa.lab.tlv.redhat.com: Name or service not known\nmount.nfs: Operation already in progress\n')

Can't ping the storage from the host as well. 

But this generic error message is not useful at all

Comment 13 Hetz Ben Hamo 2018-07-18 16:07:05 UTC
Just checked the latest stable as well as 4.3.0 Master branch.

Looks like when the node installer creates the ovirtmgmt interfaces file (it doesn't appear in nmcli, only in /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt, it doesn't write the DOMAIN line in the configuration file, so when trying to mount anything (NFS, iSCSI, etc) and trying to deploy, it fails with Error 400.

Until it will be fixed, I suggest a simple workaround: edit the above file, add a line DOMAIN=yourdomain.com (with your actual search domain), and restart the network services.

Comment 14 Raz Tamir 2018-11-28 12:19:17 UTC
Hi Shani,

I still see this issue in 4.3.0-0.2.master.20181127150027.gitd731af3.el7

failed: [storage-ge-05.scl.lab.tlv.redhat.com] (item={'value': {u'state': u'present', u'glusterfs': {u'path': u'/storage_local_ge5_volume_0', u'address': u'gluster01.scl.lab.tlv.redhat.com'}, u'format': True}, 'key': u'test_gluster_0'}) => {"changed": false, "item": {"key": "test_gluster_0", "value": {"format": true, "glusterfs": {"address": "gluster01.scl.lab.tlv.redhat.com", "path": "/storage_local_ge5_volume_0"}, "state": "present"}}, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."}
11:42:03 An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Error: Fault reason is "Operation Failed". Fault detail is "[Error creating a storage domain]". HTTP response code is 400.
11:42:03 failed: [storage-ge-05.scl.lab.tlv.redhat.com] (item={'value': {u'state': u'present', u'glusterfs': {u'path': u'/storage_local_ge5_volume_1', u'address': u'gluster01.scl.lab.tlv.redhat.com'}, u'format': True}, 'key': u'test_gluster_1'}) => {"changed": false, "item": {"key": "test_gluster_1", "value": {"format": true, "glusterfs": {"address": "gluster01.scl.lab.tlv.redhat.com", "path": "/storage_local_ge5_volume_1"}, "state": "present"}}, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."}
11:42:21 An exception occurred during task execution. To see the full traceback, use -vvv. The error was: Error: Fault reason is "Operation Failed". Fault detail is "[Error creating a storage domain]". HTTP response code is 400.
11:42:21 failed: [storage-ge-05.scl.lab.tlv.redhat.com] (item={'value': {u'state': u'present', u'glusterfs': {u'path': u'/storage_local_ge5_volume_2', u'address': u'gluster01.scl.lab.tlv.redhat.com'}, u'format': True}, 'key': u'test_gluster_2'}) => {"changed": false, "item": {"key": "test_gluster_2", "value": {"format": true, "glusterfs": {"address": "gluster01.scl.lab.tlv.redhat.com", "path": "/storage_local_ge5_volume_2"}, "state": "present"}}, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error creating a storage domain]\". HTTP response code is 400."}

Could you please check?

Comment 19 Sandro Bonazzola 2019-02-13 07:45:04 UTC
This bugzilla is included in oVirt 4.3.0 release, published on February 4th 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.3.0 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.