messages report Introspection for nodes 'has timed out' while the command returns ' Discovery completed.' Environment : ------------- python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch instack-undercloud-2.1.2-22.el7ost.noarch instack-0.0.7-1.el7ost.noarch openstack-ironic-conductor-2015.1.0-9.el7ost.noarch python-ironicclient-0.5.1-9.el7ost.noarch python-ironic-discoverd-1.1.0-5.el7ost.noarch openstack-ironic-common-2015.1.0-9.el7ost.noarch openstack-ironic-discoverd-1.1.0-5.el7ost.noarch openstack-ironic-api-2015.1.0-9.el7ost.noarch openstack-tripleo-puppet-elements-0.0.1-4.el7ost.noarch openstack-puppet-modules-2015.1.8-8.el7ost.noarch puppet-3.6.2-2.el7.noarch openstack-heat-templates-0-0.6.20150605git.el7ost.noarch Description: ------------- The 'openstack baremetal introspection bulk start' --> took ~30 minutes t and returned 'Discovery completed.' - so It seems like everything is : OK , I proceed with the overcloud deployment that failed, looking at /var/log/messages it seems that ironic discovery of one of the nodes has timed out. Jul 28 12:10:47 rhos-compute-node-18 ironic-discoverd: ERROR:ironic_discoverd.node_cache:Introspection for nodes [u'f619570d-6ce4-4891-acfa-f5ba4cf6f7a1', u'4 ffc6413-ad56-4080-bb78-2dc82762a355', u'51929ba4-c49a-4f71-8f7d-2d0d350249be', u'e0ac4748-b4bd-44cc-81be-ece57cd56d65'] has timed out The question is - why the discovery returned 'Discovery completed.' when one of the nodes has timed out. var/log/messages (attached ) ----------------------------- Jul 28 12:10:46 rhos-compute-node-18 ironic-api: File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 376, in sendall Jul 28 12:10:46 rhos-compute-node-18 ironic-api: tail = self.send(data, flags) Jul 28 12:10:46 rhos-compute-node-18 ironic-api: File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send Jul 28 12:10:46 rhos-compute-node-18 ironic-api: total_sent += fd.send(data[total_sent:], flags) Jul 28 12:10:46 rhos-compute-node-18 ironic-api: error: [Errno 32] Broken pipe Jul 28 12:10:47 rhos-compute-node-18 ironic-discoverd: ERROR:ironic_discoverd.node_cache:Introspection for nodes [u'f619570d-6ce4-4891-acfa-f5ba4cf6f7a1', u'4 ffc6413-ad56-4080-bb78-2dc82762a355', u'51929ba4-c49a-4f71-8f7d-2d0d350249be', u'e0ac4748-b4bd-44cc-81be-ece57cd56d65'] has timed out Jul 28 12:10:47 rhos-compute-node-18 ironic-api: 2015-07-28 12:10:47.818 1612 DEBUG keystonemiddleware.auth_token [-] Removing headers from request environmen t: X-Service-Catalog,X-Identity-Status,X-Service-Identity-Status,X-Roles,X-Service-Roles,X-Domain-Name,X-Service-Domain-Name,X-Project-Id,X-Service-Project-Id ,X-Project-Domain-Name,X-Service-Project-Domain-Name,X-User-Id,X-Service-User-Id,X-User-Name,X-Service-User-Name,X-Project-Name,X-Service-Project-Name,X-User- Domain-Id,X-Service-User-Domain-Id,X-Domain-Id,X-Service-Domain-Id,X-User-Domain-Name,X-Service-User-Domain-Name,X-Project-Domain-Id,X-Service-Project-Domain- Id,X-Role,X-User,X-Tenant-Name,X-Tenant-Id,X-Tenant _remove_auth_headers /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:672 Jul 28 12:10:47 rhos-compute-node-18 ironic-api: 2015-07-28 12:10:47.819 1612 DEBUG keystonemiddleware.auth_token [-] Authenticating user token __call__ /usr/ lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:581
Created attachment 1057062 [details] log1 adding file : sudo journalctl -u openstack-ironic-discoverd -u openstack-ironic-discoverd-dnsmasq > log1
Created attachment 1057074 [details] valid_instaclk_.json running validator on the instackenv.json returns fine .
Created attachment 1057075 [details] instackenv.json
Created attachment 1057076 [details] messages
possible solution for the discovery timeout :https://bugzilla.redhat.com/show_bug.cgi?id=1235255#c5 but it still not fixing the bad reporting.
Is this reproducible 100% of the time?
(In reply to Rafael Rosa from comment #9) > Is this reproducible 100% of the time? just to emphasis - the problem is when node-time outs during discovery, this situation should not happen if everything is OK . In case there is a node time out the discovery should not return 'Discovery completed.' To reproduce we'll need to find a negative scenario that will cause node to time-out, it reproduced on my setup while I opened the bug (since I had a problem ), but I assume that another way to reproduce it will be to tamper with the instackenv.json file with false parameters, in a way that will cause timeout.
unable to reproduce with python-rdomanager-oscplugin-0.0.10-3.el7ost.noarch when attempted to set a wrong MAC address in the instack.json - it seems that the introspection succeeded anyway, and when used a false password for IPMI nic the operation failed immediately as expected.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2015:1862