Bug 1247722 - messages report Introspection for one of the nodes 'has timed out' while the command returns ' Discovery completed.'
Summary: messages report Introspection for one of the nodes 'has timed out' while the ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: python-rdomanager-oscplugin
Version: unspecified
Hardware: x86_64
OS: Linux
high
high
Target Milestone: y1
: 7.0 (Kilo)
Assignee: Brad P. Crochet
QA Contact: Omri Hochman
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-28 16:46 UTC by Omri Hochman
Modified: 2023-02-22 23:02 UTC (History)
9 users (show)

Fixed In Version: python-rdomanager-oscplugin-0.0.9-1.el7ost
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-10-08 12:15:50 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
log1 (412.69 KB, text/x-vhdl)
2015-07-28 16:49 UTC, Omri Hochman
no flags Details
valid_instaclk_.json (4.02 KB, text/plain)
2015-07-28 16:53 UTC, Omri Hochman
no flags Details
instackenv.json (2.36 KB, text/plain)
2015-07-28 16:54 UTC, Omri Hochman
no flags Details
messages (20.00 KB, application/x-tar)
2015-07-28 16:57 UTC, Omri Hochman
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Gerrithub.io 243621 0 None None None Never
Red Hat Product Errata RHSA-2015:1862 0 normal SHIPPED_LIVE Moderate: Red Hat Enterprise Linux OpenStack Platform 7 director update 2015-10-08 16:05:50 UTC

Description Omri Hochman 2015-07-28 16:46:18 UTC
messages report Introspection for nodes 'has timed out' while the command returns ' Discovery completed.' 

Environment :
-------------
python-rdomanager-oscplugin-0.0.8-43.el7ost.noarch
instack-undercloud-2.1.2-22.el7ost.noarch
instack-0.0.7-1.el7ost.noarch
openstack-ironic-conductor-2015.1.0-9.el7ost.noarch
python-ironicclient-0.5.1-9.el7ost.noarch
python-ironic-discoverd-1.1.0-5.el7ost.noarch
openstack-ironic-common-2015.1.0-9.el7ost.noarch
openstack-ironic-discoverd-1.1.0-5.el7ost.noarch
openstack-ironic-api-2015.1.0-9.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-4.el7ost.noarch
openstack-puppet-modules-2015.1.8-8.el7ost.noarch
puppet-3.6.2-2.el7.noarch
openstack-heat-templates-0-0.6.20150605git.el7ost.noarch


Description: 
-------------
The 'openstack baremetal introspection bulk start'  --> took  ~30 minutes      t and returned 'Discovery completed.'  - so It seems like everything is  : OK , 
I proceed with the overcloud deployment that failed, looking at /var/log/messages it seems that ironic discovery of one of the nodes  has timed out.

Jul 28 12:10:47 rhos-compute-node-18 ironic-discoverd: ERROR:ironic_discoverd.node_cache:Introspection for nodes [u'f619570d-6ce4-4891-acfa-f5ba4cf6f7a1', u'4
ffc6413-ad56-4080-bb78-2dc82762a355', u'51929ba4-c49a-4f71-8f7d-2d0d350249be', u'e0ac4748-b4bd-44cc-81be-ece57cd56d65'] has timed out


The question is - why the discovery returned 'Discovery completed.' when one of the nodes has timed out.


var/log/messages (attached ) 
-----------------------------
Jul 28 12:10:46 rhos-compute-node-18 ironic-api: File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 376, in sendall
Jul 28 12:10:46 rhos-compute-node-18 ironic-api: tail = self.send(data, flags)
Jul 28 12:10:46 rhos-compute-node-18 ironic-api: File "/usr/lib/python2.7/site-packages/eventlet/greenio/base.py", line 359, in send
Jul 28 12:10:46 rhos-compute-node-18 ironic-api: total_sent += fd.send(data[total_sent:], flags)
Jul 28 12:10:46 rhos-compute-node-18 ironic-api: error: [Errno 32] Broken pipe
Jul 28 12:10:47 rhos-compute-node-18 ironic-discoverd: ERROR:ironic_discoverd.node_cache:Introspection for nodes [u'f619570d-6ce4-4891-acfa-f5ba4cf6f7a1', u'4
ffc6413-ad56-4080-bb78-2dc82762a355', u'51929ba4-c49a-4f71-8f7d-2d0d350249be', u'e0ac4748-b4bd-44cc-81be-ece57cd56d65'] has timed out
Jul 28 12:10:47 rhos-compute-node-18 ironic-api: 2015-07-28 12:10:47.818 1612 DEBUG keystonemiddleware.auth_token [-] Removing headers from request environmen
t: X-Service-Catalog,X-Identity-Status,X-Service-Identity-Status,X-Roles,X-Service-Roles,X-Domain-Name,X-Service-Domain-Name,X-Project-Id,X-Service-Project-Id
,X-Project-Domain-Name,X-Service-Project-Domain-Name,X-User-Id,X-Service-User-Id,X-User-Name,X-Service-User-Name,X-Project-Name,X-Service-Project-Name,X-User-
Domain-Id,X-Service-User-Domain-Id,X-Domain-Id,X-Service-Domain-Id,X-User-Domain-Name,X-Service-User-Domain-Name,X-Project-Domain-Id,X-Service-Project-Domain-
Id,X-Role,X-User,X-Tenant-Name,X-Tenant-Id,X-Tenant _remove_auth_headers /usr/lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:672
Jul 28 12:10:47 rhos-compute-node-18 ironic-api: 2015-07-28 12:10:47.819 1612 DEBUG keystonemiddleware.auth_token [-] Authenticating user token __call__ /usr/
lib/python2.7/site-packages/keystonemiddleware/auth_token/__init__.py:581

Comment 1 Omri Hochman 2015-07-28 16:49:40 UTC
Created attachment 1057062 [details]
log1

adding file :  sudo journalctl -u openstack-ironic-discoverd -u openstack-ironic-discoverd-dnsmasq > log1

Comment 2 Omri Hochman 2015-07-28 16:53:41 UTC
Created attachment 1057074 [details]
valid_instaclk_.json

running validator on the instackenv.json returns fine .

Comment 3 Omri Hochman 2015-07-28 16:54:19 UTC
Created attachment 1057075 [details]
instackenv.json

Comment 4 Omri Hochman 2015-07-28 16:57:21 UTC
Created attachment 1057076 [details]
messages

Comment 6 Omri Hochman 2015-07-28 17:14:49 UTC
possible solution for the discovery timeout :https://bugzilla.redhat.com/show_bug.cgi?id=1235255#c5  

but it still not fixing the bad reporting.

Comment 7 Omri Hochman 2015-07-28 17:15:22 UTC
possible solution for the discovery timeout :https://bugzilla.redhat.com/show_bug.cgi?id=1235255#c5  

but it still not fixing the bad reporting.

Comment 9 Rafael Rosa 2015-07-29 12:11:04 UTC
Is this reproducible 100% of the time?

Comment 10 Omri Hochman 2015-07-29 15:35:52 UTC
(In reply to Rafael Rosa from comment #9)
> Is this reproducible 100% of the time?

just to emphasis - the problem is when node-time outs during discovery, this situation should not happen if everything is OK . 

In case there is a node time out the discovery should not return 'Discovery completed.'     

To reproduce we'll need to find a negative scenario that will cause node to time-out, it reproduced on my setup while I opened the bug (since I had a problem ),  but I assume that another way to reproduce it will be to tamper with the instackenv.json file with false parameters, in a way that will cause timeout.

Comment 12 Omri Hochman 2015-09-15 20:34:54 UTC
unable to reproduce with python-rdomanager-oscplugin-0.0.10-3.el7ost.noarch

when attempted to set a wrong MAC address in the instack.json - it seems that the introspection succeeded anyway, and when used a false password for IPMI nic the operation failed immediately as expected.

Comment 14 errata-xmlrpc 2015-10-08 12:15:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1862


Note You need to log in before you can comment on or make changes to this bug.