Bug 1512704

Summary: Bulk introspection succeeds, however one node failed
Product: Red Hat OpenStack Reporter: Joe Talerico <jtaleric>
Component: openstack-tripleo-commonAssignee: Dougal Matthews <dmatthew>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: high Docs Contact:
Priority: high    
Version: 12.0 (Pike)CC: bfournie, jtaleric, mburns, mlammon, rhel-osp-director-maint, slinaber, srevivo
Target Milestone: z2Keywords: Triaged, ZStream
Target Release: 12.0 (Pike)   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: openstack-tripleo-common-7.6.9-2.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-03-28 17:27:14 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Joe Talerico 2017-11-13 21:51:29 UTC
Description of problem:
Introspection reports that "All nodes introspected" however, I have a node that failed to introspect.

(undercloud) [stack@lorenzo ~]$ openstack baremetal introspection bulk start
This command is deprecated. Please use "openstack overcloud node introspect" to introspect manageable nodes instead.
Setting nodes for introspection to manageable...
Starting introspection of manageable nodes
Waiting for introspection to finish...
Started Mistral Workflow tripleo.baremetal.v1.introspect_manageable_nodes. Execution ID: 3ea06ce9-c27a-4416-a984-8f5eca19f68b
Waiting for messages on queue '8556b859-62ab-4fea-871b-be5de77ab210' with no timeout.
Introspection of node 25ad6d92-5f6e-40b1-9098-6bd85d8ca8e4 completed. Status:SUCCESS. Errors:None
Introspection of node b7052916-b67a-4100-9f8b-635d346c7f62 completed. Status:SUCCESS. Errors:None
Introspection of node 782cbb86-e518-4ca5-b036-af6537bcfde7 completed. Status:SUCCESS. Errors:None
Introspection of node 71ca3802-621b-4183-a51a-0c7c6e05004b completed. Status:SUCCESS. Errors:None
Introspection of node ac92e482-4b64-4b62-88d3-4727a9fdfd99 timed out.
Retrying 1 nodes that failed introspection. Attempt 2 of 3 
Introspection of node ac92e482-4b64-4b62-88d3-4727a9fdfd99 timed out.
Retrying 1 nodes that failed introspection. Attempt 3 of 3 
Introspection of node ac92e482-4b64-4b62-88d3-4727a9fdfd99 timed out.
Nodes introspected successfully.
Introspection completed.
Setting manageable nodes to available...
Started Mistral Workflow tripleo.baremetal.v1.provide_manageable_nodes. Execution ID: 21ad5664-acb4-4f54-bbba-8334cf00e549
Waiting for messages on queue '8556b859-62ab-4fea-871b-be5de77ab210' with no timeout.

(undercloud) [stack@lorenzo ~]$ openstack baremetal introspection status ac92e482-4b64-4b62-88d3-4727a9fdfd99
+-------------+--------------------------------------+
| Field       | Value                                |
+-------------+--------------------------------------+
| error       | None                                 |
| finished    | False                                |
| finished_at | None                                 |
| started_at  | 2017-11-13T21:18:48                  |
| state       | waiting                              |
| uuid        | ac92e482-4b64-4b62-88d3-4727a9fdfd99 |
+-------------+--------------------------------------+
(undercloud) [stack@lorenzo ~]$ 



Version-Release number of selected component (if applicable):
(undercloud) [stack@lorenzo ~]$ rpm -qa | grep ironic
openstack-ironic-common-9.1.2-0.20171025074857.cf3665f.el7ost.noarch
python-ironic-inspector-client-2.1.0-1.el7ost.noarch
openstack-ironic-conductor-9.1.2-0.20171025074857.cf3665f.el7ost.noarch
puppet-ironic-11.3.1-0.20171024200735.a6a7c9c.el7ost.noarch
python-ironicclient-1.17.0-1.el7ost.noarch
openstack-ironic-api-9.1.2-0.20171025074857.cf3665f.el7ost.noarch
python-ironic-lib-2.10.0-1.el7ost.noarch
openstack-ironic-inspector-6.0.1-0.20170920142417.77e2b1a.el7ost.noarch
(undercloud) [stack@lorenzo ~]$ 


How reproducible:
100%

Actual results:
Success

Expected results:
Failure

Additional info:

Comment 1 Bob Fournier 2017-11-16 22:08:03 UTC
Hi Joe - is it consistently the same node that fails introspection or random?  If the same, can we get console logs and/or tcpdump from this node to see if PXE boot is happening correctly?

In either case, can we get ironic inspector logs and deploy logs for the failed node (if available)?  Thanks.

Comment 2 Bob Fournier 2017-12-08 16:09:46 UTC
Joe - possible to get more info for this?

Comment 3 Joe Talerico 2017-12-08 17:48:42 UTC
@Bob - it was the same node. It was due to a boot order issue. However, bulk should of not succeeded...

Comment 4 Dmitry Tantsur 2017-12-11 15:17:49 UTC
I assume it's https://bugs.launchpad.net/tripleo/+bug/1733303

Comment 8 mlammon 2018-03-14 20:47:14 UTC
Deployed latest.  This included introspection and did not hit any error in our case

Env verified:
openstack-tripleo-common-containers-7.6.9-2.el7ost.noarch

Comment 11 errata-xmlrpc 2018-03-28 17:27:14 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:0607