Bug 1240260

Summary: introspection timed out for 2 VM nodes
Product: Red Hat OpenStack Reporter: Ola Pavlenko <opavlenk>
Component: openstack-tuskar-uiAssignee: Jiri Tomasek <jtomasek>
Status: CLOSED ERRATA QA Contact: Ola Pavlenko <opavlenk>
Severity: urgent Docs Contact:
Priority: high    
Version: DirectorCC: apevec, calfonso, dtantsur, jslagle, lhh, mburns, ohochman, opavlenk, rhel-osp-director-maint
Target Milestone: y1Keywords: Triaged, ZStream
Target Release: 7.0 (Kilo)   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tuskar-ui-0.4.0-2.el7ost Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-08 12:10:29 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1228132    
Attachments:
Description Flags
nodes csv file was used
none
dnxmasq
none
journal openstack-ironic-discoverd none

Description Ola Pavlenko 2015-07-06 11:32:39 UTC
Description of problem:
introspection process failed: "discovery failed"
actually it timed out


Version-Release number of selected component (if applicable):
---------------------------------------
# rpm -qa | grep tuskar
python-tuskarclient-0.1.18-3.el7ost.noarch
openstack-tuskar-0.4.18-3.el7ost.noarch
openstack-tuskar-ui-extras-0.0.4-1.el7ost.noarch
openstack-tuskar-ui-0.3.0-8.el7ost.noarch

# rpm -qa | grep ironic
openstack-ironic-common-2015.1.0-8.el7ost.noarch
openstack-ironic-discoverd-1.1.0-4.el7ost.noarch
python-ironic-discoverd-1.1.0-4.el7ost.noarch
openstack-ironic-conductor-2015.1.0-8.el7ost.noarch
openstack-ironic-api-2015.1.0-8.el7ost.noarch
python-ironicclient-0.5.1-9.el7ost.noarch

# rpm -qa | grep rhos
rhos-release-0.63-1.noarch

How reproducible:
--------------------------
100%

Steps to Reproduce:
-------------------------
1.register nodes via UI
2.introspection will implicitly start 

Actual results:
------------------------
introspection fails after an hour approximately
with "discovery failed"

Expected results:
-------------------------
introspection succeed


Additional info:
------------------------

sudo journalctl -u openstack-ironic-discoverd:

-- Logs begin at Sun 2015-07-05 15:08:12 EDT, end at Mon 2015-07-06 07:12:43 EDT. --
Jul 05 15:45:08 instack.localdomain systemd[1]: Starting Hardware introspection service for OpenStack Ironic...
Jul 05 15:45:08 instack.localdomain systemd[1]: Started Hardware introspection service for OpenStack Ironic.
Jul 05 15:45:11 instack.localdomain ironic-discoverd[31536]: INFO:ironic_discoverd.main:Enabled processing hooks: ['ramdisk_error'
, 'root_device_hint', 'scheduler', 'validate_interfaces', 'edeploy']
Jul 05 15:45:11 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug: * Running on http://0.0.0.0:5050/
Jul 06 04:11:50 instack.localdomain ironic-discoverd[31536]: INFO:ironic_discoverd.utils:Node 2233257b-ec0d-4caf-8d49-e45e0a46d527
 is in maintenance mode, skipping provision states check
Jul 06 04:11:50 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:11:50] "POST /v1/introspe
ction/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 202 -
Jul 06 04:11:50 instack.localdomain ironic-discoverd[31536]: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'00:4d:a2:4a:92
:4d'] for node 2233257b-ec0d-4caf-8d49-e45e0a46d527 on the firewall
Jul 06 04:11:51 instack.localdomain ironic-discoverd[31536]: INFO:ironic_discoverd.utils:Node b15fb49b-0d75-49b1-b3f1-78ceccb2d60d
 is in maintenance mode, skipping provision states check
Jul 06 04:11:51 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:11:51] "POST /v1/introspe
ction/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 202 -
Jul 06 04:11:52 instack.localdomain ironic-discoverd[31536]: INFO:ironic_discoverd.introspect:Whitelisting MAC's [u'00:cc:7e:44:c3
:b0'] for node b15fb49b-0d75-49b1-b3f1-78ceccb2d60d on the firewall
Jul 06 04:11:53 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:11:53] "GET /v1/introspec
tion/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 200 -
Jul 06 04:11:53 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:11:53] "GET /v1/introspec
tion/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 200 -
Jul 06 04:12:53 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:12:53] "GET /v1/introspec
tion/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 200 -
Jul 06 04:12:53 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:12:53] "GET /v1/introspec
tion/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 200 -
Jul 06 04:16:30 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:16:30] "GET /v1/introspec
tion/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 200 -
Jul 06 04:16:30 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:16:30] "GET /v1/introspec
tion/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 200 -
Jul 06 04:16:35 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:16:35] "GET /v1/introspec
tion/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 200 -
Jul 06 04:16:35 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:16:35] "GET /v1/introspec
tion/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 200 -
Jul 06 04:28:36 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:28:36] "GET /v1/introspec
tion/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 200 -
Jul 06 04:28:36 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:28:36] "GET /v1/introspec
tion/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 200 -
Jul 06 04:50:21 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:50:21] "GET /v1/introspec
tion/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 200 -
Jul 06 04:50:21 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 04:50:21] "GET /v1/introspec
tion/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 200 -
Jul 06 05:12:13 instack.localdomain ironic-discoverd[31536]: ERROR:ironic_discoverd.node_cache:Introspection for nodes [u'2233257b-ec0d-4caf-8d49-e45e0a46d527', u'b15fb49b-0d75-49b1-b3f1-78ceccb2d60d'] has timed out
Jul 06 05:19:18 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 05:19:18] "GET /v1/introspection/2233257b-ec0d-4caf-8d49-e45e0a46d527 HTTP/1.1" 200 -
Jul 06 05:19:18 instack.localdomain ironic-discoverd[31536]: INFO:werkzeug:192.0.2.1 - - [06/Jul/2015 05:19:18] "GET /v1/introspection/b15fb49b-0d75-49b1-b3f1-78ceccb2d60d HTTP/1.1" 200 -
~

Comment 3 Ola Pavlenko 2015-07-06 11:36:26 UTC
Created attachment 1048798 [details]
nodes csv file was used

Comment 5 chris alfonso 2015-07-06 16:18:03 UTC
Does this happen on baremetal?

Comment 7 Dmitry Tantsur 2015-07-07 10:49:41 UTC
Please also provide output of $ sudo journalctl -u openstack-ironic-discoverd-dnsmasq

Also the remaining steps of https://github.com/openstack/ironic-inspector#introspection-times-out

Comment 8 Dmitry Tantsur 2015-07-08 15:05:52 UTC
Were several NIC's used? If so, might be case of https://bugzilla.redhat.com/show_bug.cgi?id=1234601

Comment 9 Ola Pavlenko 2015-07-08 15:52:25 UTC
(In reply to Dmitry Tantsur from comment #7)
> Please also provide output of $ sudo journalctl -u
> openstack-ironic-discoverd-dnsmasq
> 
> Also the remaining steps of
> https://github.com/openstack/ironic-inspector#introspection-times-out

Unfortunately I've used this env for the test day (e.g reprovisioned)
I'll reproduce it and provide all requested info.

Comment 12 Ola Pavlenko 2015-07-09 10:11:21 UTC
Created attachment 1050209 [details]
dnxmasq

Reproduce on lynx17.qa.lab.tlv.redhat.com / qum10net with 2 nodes when 1 was successfully discovered and the second one failed.

Comment 13 Ola Pavlenko 2015-07-09 10:12:08 UTC
Created attachment 1050210 [details]
journal openstack-ironic-discoverd

Comment 14 Dmitry Tantsur 2015-07-09 11:12:43 UTC
Ah, missing sleep again. Meh, I'm going to do something about it upstream, but for current release we need sleeps or similar measure in UI.

UI folks, could you please make sure we wait say 5 seconds between sending nodes on introspection?

Comment 15 Jiri Tomasek 2015-07-09 12:31:30 UTC
Upstream patch is available here: https://review.openstack.org/#/c/200045/

Comment 17 Ola Pavlenko 2015-09-08 15:35:41 UTC
verified  openstack-tuskar-ui-0.4.0-3.el7ost.noarch

Comment 19 errata-xmlrpc 2015-10-08 12:10:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2015:1862