Description of problem: The issue was ironic by default uses the ADMINISTRATOR, this env we had a OPERATOR account. However, ironic baremetal import shouldn't just hang forever. There should be a timeout, or if we fail to get power status we break out with a error to the user. Version-Release number of selected component (if applicable): OSP10 How reproducible: 100% Steps to Reproduce: 1. Import a instackenv.json with the wrong account type Actual results: Import hangs Expected results: Import exits 1, with error. Additional info:
Could you please fetch ironic and mistral logs?
The CNCF lab is gone but this should be easily reproduced.
*** Bug 1440959 has been marked as a duplicate of this bug. ***
I can confirm this - by providing a wrong password one can make 'openstack overcloud node import' hang. Ironic itself correctly puts the node back to enroll, we should probably stop retrying in this case.
OSP 13 backport is https://review.openstack.org/#/c/559314/
Moving to OSP-13.
Verified: Environment: openstack-tripleo-common-8.6.1-6.el7ost.noarch Set wrong password for one node and ran the import (didn't get stuck, completed within 1 min with error per below): (undercloud) [stack@undercloud-0 ~]$ openstack overcloud node import instackenv.json Started Mistral Workflow tripleo.baremetal.v1.register_or_update. Execution ID: 9757f55d-c829-4b92-8c10-f8283be1f414 Waiting for messages on queue 'tripleo' with no timeout. [{u'result': u'Node 16c3e4ab-fa68-4306-b1c9-ba71b7ca338f did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 16c3e4ab-fa68-4306-b1c9-ba71b7ca338f. Error: IPMI call failed: power status.'}, {}, {}, {}, {}, {}, {}, {}, {}, {}] {u'status': u'FAILED', u'message': [{u'result': u'Node 16c3e4ab-fa68-4306-b1c9-ba71b7ca338f did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 16c3e4ab-fa68-4306-b1c9-ba71b7ca338f. Error: IPMI call failed: power status.'}, {}, {}, {}, {}, {}, {}, {}, {}, {}], u'result': None} Exception registering nodes: {u'status': u'FAILED', u'message': [{u'result': u'Node 16c3e4ab-fa68-4306-b1c9-ba71b7ca338f did not reach state "manageable", the state is "enroll", error: Failed to get power state for node 16c3e4ab-fa68-4306-b1c9-ba71b7ca338f. Error: IPMI call failed: power status.'}, {}, {}, {}, {}, {}, {}, {}, {}, {}], u'result': None} All nodes except the one with wrong password are managed (as expected): (undercloud) [stack@undercloud-0 ~]$ openstack baremetal node list +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | UUID | Name | Instance UUID | Power State | Provisioning State | Maintenance | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+ | 16c3e4ab-fa68-4306-b1c9-ba71b7ca338f | ceph-0 | None | None | enroll | False | | 08606c14-eac6-4e29-a9e0-e8c3c4925679 | ceph-1 | None | power on | manageable | False | | 6b5514db-ef09-4366-9131-89587749b4a2 | ceph-2 | None | power on | manageable | False | | 1836f58e-4c44-4e43-9e5c-17a8adc02c58 | compute-0 | None | power on | manageable | False | | 4a5ba3d1-241d-40a4-ae13-d148e924dc2a | compute-1 | None | power on | manageable | False | | 9ea43886-fac5-4054-9169-75b41f45064a | controller-0 | None | power on | manageable | False | | 9bc0eed1-6e78-4b27-a05a-386549a6efb6 | controller-1 | None | power on | manageable | False | | 97f01df8-ff61-4bac-902d-dc0dd89451a7 | controller-2 | None | power on | manageable | False | | 90bdc756-ab3e-48b2-b791-fce071de0033 | ironic-0 | None | power off | manageable | False | | e4748770-7e91-40f5-b4cf-e1b067a84b4f | ironic-1 | None | power off | manageable | False | +--------------------------------------+--------------+---------------+-------------+--------------------+-------------+
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086