Bug 1416457 - Mistral Workflow got timeout error when introspecting all nodes
Summary: Mistral Workflow got timeout error when introspecting all nodes
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-ironic
Version: 10.0 (Newton)
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
: ---
Assignee: Lucas Alvares Gomes
QA Contact: Raviv Bar-Tal
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-01-25 14:43 UTC by Jean-Tsung Hsiao
Modified: 2017-02-20 03:16 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-02-20 03:16:32 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Jean-Tsung Hsiao 2017-01-25 14:43:06 UTC
Description of problem:  Mistral Workflow got timeout error when introspecting all nodes

openstack overcloud node introspect --all-manageable --provide
Started Mistral Workflow. Execution ID: c51c6807-d674-4969-82fd-81a92299dd0e
Waiting for introspection to finish...
Introspection for UUID 66fd3001-7696-42b8-8b60-93a9118b59d6 finished with error: Introspection timeout
Introspection for UUID 963b6bff-222d-4860-95e6-053b0648f57b finished successfully.
Introspection for UUID d07d59b9-25c6-4263-9f5c-874aa9425e03 finished successfully.
Introspection completed with errors:
66fd3001-7696-42b8-8b60-93a9118b59d6: Introspection timeout

[stack@netqe17 ~]$ openstack baremetal node list
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                    | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| d07d59b9-25c6-4263      | netqe19 | None          | power off   | manageable         | False       |
| -9f5c-874aa9425e03      |         |               |             |                    |             |
| 66fd3001-7696-42b8-8b60 | netqe9  | None          | power on    | manageable         | False       |
| -93a9118b59d6           |         |               |             |                    |             |
| 963b6bff-222d-4860-95e6 | netqe10 | None          | power off   | manageable         | False       |
| -053b0648f57b           |         |               |             |                    |             |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
[stack@netqe17 ~]$ 

stack@netqe17 ~]$ mistral action-execution-list
.
.
.

 c423e8bf-d542- | ironic.node_ge | tripleo.bareme | wait_for_power | 12ac09b5-d61f-     | SUCCESS | False    |
| 4804-88ab-     | t              | tal.v1.set_pow | _state         | 408a-              |         |          |
| 66f46f98639e   |                | er_state       |                | a5b6-e4323ccb963d  |         |          |
| ef52d5ba-      | ironic.node_ge | tripleo.bareme | wait_for_power | 12ac09b5-d61f-     | SUCCESS | False    |
| 55b9-4d64      | t              | tal.v1.set_pow | _state         | 408a-              |         |          |
| -b7aa-         |                | er_state       |                | a5b6-e4323ccb963d  |         |          |
| 68eadd5537d1   |                |                |                |                    |         |          |
| bcc26b3c-b51c- | ironic.node_ge | tripleo.bareme | wait_for_power | 12ac09b5-d61f-     | SUCCESS | False    |
| 485b-b00d-     | t              | tal.v1.set_pow | _state         | 408a-              |         |          |
| d8c6baf34b48   |                | er_state       |                | a5b6-e4323ccb963d  |         |       

.
.
.
Version-Release number of selected component (if applicable):
[stack@netqe17 yum.repos.d]$ cat RH7-RHOS-10.0.repo
[RH7-RHOS-10.0]
name=RH7-RHOS-10.0
baseurl=http://download.devel.redhat.com/rcm-guest/puddles/OpenStack/10.0-RHEL-7/2016-12-12.1/RH7-RHOS-10.0/$basearch/os
gpgcheck=0
enabled=1

[stack@netqe17 yum.repos.d]$ uname -a
Linux netqe17.knqe.lab.eng.bos.redhat.com 3.10.0-514.el7.x86_64 #1 SMP Wed Oct 19 11:24:13 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux
[stack@netqe17 yum.repos.d]$

How reproducible: Reproducible


Steps to Reproduce:
1.Install undercloud
2.Register three nodes; set all manageble
3.openstack overcloud node introspect --all-manageable --provide

Actual results:
The process timed out in about an hour.
One particular stays in power on state. This could be the issue.

[stack@netqe17 yum.repos.d]$ openstack baremetal node list
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                    | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| d07d59b9-25c6-4263      | netqe19 | None          | power off   | manageable         | False       |
| -9f5c-874aa9425e03      |         |               |             |                    |             |
| 66fd3001-7696-42b8-8b60 | netqe9  | None          | power on    | manageable         | False       |
| -93a9118b59d6           |         |               |             |                    |             |
| 963b6bff-222d-4860-95e6 | netqe10 | None          | power off   | manageable         | False       |
| -053b0648f57b           |         |               |             |                    |             |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
[stack@netqe17 yum.repos.d]$

Expected results:
Should not timeout.

Additional info:

Comment 1 Jean-Tsung Hsiao 2017-01-25 16:28:35 UTC
The failed node 66fd3001-7696-42b8-8b60 took about 61 minutes when introspected individually. On the other hand the other two nodes took less than 8 minutes when each introspected individually.


 [stack@netqe17 yum.repos.d]$ time openstack overcloud node introspect 66fd3001-7696-42b8-8b60-93a9118b59d6 --provide
Started Mistral Workflow. Execution ID: fb6ac7c5-2d94-427d-99a9-2fe64df3627a
Waiting for introspection to finish...
Successfully introspected all nodes.
Introspection completed.
Started Mistral Workflow. Execution ID: 0ac0be0c-ebbe-4a0e-bfcc-3b6c7786e66a
Successfully set all nodes to available.

real	61m10.857s
user	0m0.388s
sys	0m0.085s
[stack@netqe17 yum.repos.d]$ openstack baremetal node list
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                    | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| d07d59b9-25c6-4263      | netqe19 | None          | power off   | manageable         | False       |
| -9f5c-874aa9425e03      |         |               |             |                    |             |
| 66fd3001-7696-42b8-8b60 | netqe9  | None          | power off   | available          | False       |
| -93a9118b59d6           |         |               |             |                    |             |
| 963b6bff-222d-4860-95e6 | netqe10 | None          | power off   | manageable         | False       |
| -053b0648f57b           |         |               |             |                    |             |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
[stack@netqe17 yum.repos.d]$ !800
time openstack overcloud node introspect d07d59b9-25c6-4263-9f5c-874aa9425e03 --provide
Started Mistral Workflow. Execution ID: e0507f51-e89c-4210-b850-07ea51c8251a
Waiting for introspection to finish...
Successfully introspected all nodes.
Introspection completed.
Started Mistral Workflow. Execution ID: 08bea3be-3c0c-435c-8617-927d7c7f4b8a
Successfully set all nodes to available.

real	6m24.312s
user	0m0.376s
sys	0m0.086s
[stack@netqe17 yum.repos.d]$ openstack baremetal node list+-------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                    | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| d07d59b9-25c6-4263      | netqe19 | None          | power off   | available          | False       |
| -9f5c-874aa9425e03      |         |               |             |                    |             |
| 66fd3001-7696-42b8-8b60 | netqe9  | None          | power off   | available          | False       |
| -93a9118b59d6           |         |               |             |                    |             |
| 963b6bff-222d-4860-95e6 | netqe10 | None          | power off   | manageable         | False       |
| -053b0648f57b           |         |               |             |                    |             |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
[stack@netqe17 yum.repos.d]$ 
[stack@netqe17 yum.repos.d]$ 
[stack@netqe17 yum.repos.d]$ !802
time openstack overcloud node introspect 963b6bff-222d-4860-95e6-053b0648f57b --provide
Started Mistral Workflow. Execution ID: 3cfa8c29-a27c-468e-ab57-ed11cd51a9f2
Waiting for introspection to finish...
Successfully introspected all nodes.
Introspection completed.
Started Mistral Workflow. Execution ID: 36951a4f-75af-4690-8cfa-1a3540a6ba59
Successfully set all nodes to available.

real	7m40.678s
user	0m0.401s
sys	0m0.069s
[stack@netqe17 yum.repos.d]$ openstack baremetal node list
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| UUID                    | Name    | Instance UUID | Power State | Provisioning State | Maintenance |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
| d07d59b9-25c6-4263      | netqe19 | None          | power off   | available          | False       |
| -9f5c-874aa9425e03      |         |               |             |                    |             |
| 66fd3001-7696-42b8-8b60 | netqe9  | None          | power off   | available          | False       |
| -93a9118b59d6           |         |               |             |                    |             |
| 963b6bff-222d-4860-95e6 | netqe10 | None          | power off   | available          | False       |
| -053b0648f57b           |         |               |             |                    |             |
+-------------------------+---------+---------------+-------------+--------------------+-------------+
[stack@netqe17 yum.repos.d]$

Comment 2 Jean-Tsung Hsiao 2017-02-20 03:16:32 UTC
RCA: The cable for a NIC was disconnected from the network switch.

Close this as it not a bug.


Note You need to log in before you can comment on or make changes to this bug.