Bug 1636545 - openstack overcloud node delete exits only after 60minutes, even if the stack update operation finished before
Summary: openstack overcloud node delete exits only after 60minutes, even if the stack...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-common
Version: 14.0 (Rocky)
Hardware: x86_64
OS: All
high
urgent
Target Milestone: beta
: 14.0 (Rocky)
Assignee: Adriano Petrich
QA Contact: Alexander Chuzhoy
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-05 16:05 UTC by Marius Cornea
Modified: 2019-01-11 11:53 UTC (History)
11 users (show)

Fixed In Version: openstack-tripleo-common-9.4.1-0.20181002162542.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-01-11 11:53:36 UTC
Target Upstream Version:
Embargoed:
apetrich: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1796893 0 None None None 2018-10-09 13:58:31 UTC
OpenStack gerrit 608995 0 'None' 'MERGED' 'Default the scale status to SUCCESS' 2019-12-02 02:07:27 UTC
OpenStack gerrit 609705 0 'None' 'MERGED' 'Default the scale status to SUCCESS' 2019-12-02 02:07:27 UTC
Red Hat Product Errata RHEA-2019:0045 0 None None None 2019-01-11 11:53:43 UTC

Description Marius Cornea 2018-10-05 16:05:19 UTC
Description of problem:
openstack overcloud node delete exits only after 60minutes, even if the stack update operation completed before:

(undercloud) [stack@undercloud-0 ~]$ time openstack overcloud node
delete 4c199f44-d5c7-4733-bf14-8c2c38141f12
Deleting the following nodes from stack overcloud:
- 4c199f44-d5c7-4733-bf14-8c2c38141f12
Waiting for messages on queue 'tripleo' with no timeout.

Connection is already closed.

real 60m14.550s
user 0m1.023s
sys 0m0.221s

The stack update operation finished in less than 60minutes:

(undercloud) [stack@undercloud-0 ~]$ openstack stack list
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| ID                                   | Stack Name | Project
                | Stack Status    | Creation Time        | Updated
Time         |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+
| edecc270-20ff-4ac3-b08d-fe8c0877cd41 | overcloud  |
3c2b3888141742bd8fe464c163b3ca08 | UPDATE_COMPLETE |
2018-10-03T00:38:16Z | 2018-10-04T15:06:45Z |
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+

Version-Release number of selected component (if applicable):
python-tripleoclient-heat-installer-10.5.1-0.20180906012842.el7ost.noarch
python-tripleoclient-10.5.1-0.20180906012842.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy overcloud with 1 controller + 2 computes
2. Remove one compute node:
openstack overcloud node delete $node_uuid

Actual results:
The command appears to exit after 60 minutes even if the stack update finished before.

Expected results:
openstack overcloud node delete command exits after the stack update has finished.

Additional info:

Comment 3 Alex Schultz 2018-10-08 21:52:47 UTC
I'm wondering if this is related to the log rotation code downstream. We noticed that upstream heat/mistral do not play nicely when they are SIGHUP'd.  As I do not believe we've landed the copytruncate for 14 it might be related to that.

Comment 4 Alex Schultz 2018-10-08 22:10:39 UTC
Actually no, the process is completing but it seems like the messaging is getting lost.

Comment 5 Alex Schultz 2018-10-08 22:58:26 UTC
Ah ha, yaql error.

(undercloud) [cloud-user@undercloud heat]$ openstack workflow execution show 3cd8fdfe-8b5b-4be6-9922-163ed11b5110 -f yaml
/usr/lib/python2.7/site-packages/requests/__init__.py:91: RequestsDependencyWarning: urllib3 (1.21.1) or chardet (2.2.1) doesn't match a supported version!
  RequestsDependencyWarning)
ID: 3cd8fdfe-8b5b-4be6-9922-163ed11b5110
Workflow ID: cadaa0b6-b4a4-443f-bf04-7c5d67be0778
Workflow name: tripleo.scale.v1.delete_node
Workflow namespace: ''
Description: ''
Task Execution ID: <none>
Root Execution ID: <none>
State: ERROR
State info: "Failed to run task [error=Can not evaluate YAQL expression [expression=$.status,\
  \ error=u'status', data={}], wf=tripleo.scale.v1.delete_node, task=send_message]:\n\
  Traceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py\"\
  , line 63, in run_task\n    task.run()\n  File \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\"\
  , line 159, in wrapper\n    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 390, in run\n    self._run_new()\n  File \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\"\
  , line 159, in wrapper\n    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 419, in _run_new\n    self._schedule_actions()\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 483, in _schedule_actions\n    input_dict = self._get_action_input()\n  File\
  \ \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\", line 159, in wrapper\n\
  \    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 514, in _get_action_input\n    input_dict = self._evaluate_expression(self.task_spec.get_input(),\
  \ ctx)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\", line\
  \ 540, in _evaluate_expression\n    ctx_view\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\"\
  , line 100, in evaluate_recursively\n    data[key] = _evaluate_item(data[key], context)\n\
  \  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\", line\
  \ 79, in _evaluate_item\n    return evaluate(item, context)\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\"\
  , line 71, in evaluate\n    return evaluator.evaluate(expression, context)\n  File\
  \ \"/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py\", line\
  \ 159, in evaluate\n    cls).evaluate(trim_expr, data_context)\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py\"\
  , line 113, in evaluate\n    \", data=%s]\" % (expression, str(e), data_context)\n\
  YaqlEvaluationException: Can not evaluate YAQL expression [expression=$.status,\
  \ error=u'status', data={}]\n"
Created at: '2018-10-08 22:24:40'
Updated at: '2018-10-08 22:32:54'

Comment 12 Artem Hrechanychenko 2018-11-01 10:37:13 UTC
VERIFIED

openstack-tripleo-common-9.4.1-0.20181012010866.67bab16.el7ost.noarch

+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| 8d7f1352-a261-482b-a366-bc1b8b36085f | compute-0    | ACTIVE | -          | Running     | ctlplane=192.168.24.15 |
| a320cf5d-ed11-4af7-9aad-829f9f82204f | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 0674db7c-6023-4a08-a5dc-5448d16f9459 | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.7  |
| bd5135bb-a946-440f-a540-ff4e758b1830 | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+


(undercloud) [stack@undercloud-0 ~]$ time openstack overcloud node delete 8d7f1352-a261-482b-a366-bc1b8b36085f
Deleting the following nodes from stack overcloud:
- 8d7f1352-a261-482b-a366-bc1b8b36085f
Waiting for messages on queue 'tripleo' with no timeout.

real	19m1.223s
user	0m0.981s
sys	0m0.233s
(undercloud) [stack@undercloud-0 ~]$ nova list
/usr/lib/python2.7/site-packages/urllib3/connection.py:344: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
/usr/lib/python2.7/site-packages/urllib3/connection.py:344: SubjectAltNameWarning: Certificate for 192.168.24.2 has no `subjectAltName`, falling back to check for a `commonName` for now. This feature is being removed by major browsers and deprecated by RFC 2818. (See https://github.com/shazow/urllib3/issues/497 for details.)
  SubjectAltNameWarning
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| ID                                   | Name         | Status | Task State | Power State | Networks               |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+
| a320cf5d-ed11-4af7-9aad-829f9f82204f | controller-0 | ACTIVE | -          | Running     | ctlplane=192.168.24.13 |
| 0674db7c-6023-4a08-a5dc-5448d16f9459 | controller-1 | ACTIVE | -          | Running     | ctlplane=192.168.24.7  |
| bd5135bb-a946-440f-a540-ff4e758b1830 | controller-2 | ACTIVE | -          | Running     | ctlplane=192.168.24.11 |
+--------------------------------------+--------------+--------+------------+-------------+------------------------+

Comment 14 Jim Bagwell 2018-11-12 18:27:20 UTC
Regarding comment #12 - I attempted to install using a later version of the openstack-tripleo-common rpm and did not achieve the result shown in comment #12.



Stack trace still can be seen when performing a scale-in:

Command:
openstack overcloud node delete <uuid>

Symptoms:
Command continues to hang indefinitely

open stack stack list shows it was successful after 10 minutes, but command never returns. 

[stack@undercloud (stackrc) ~]$ openstack stack list                                                                                                     
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ 
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time         | 
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ 
| 00462a04-88ff-4558-a8e0-d86f16f39241 | overcloud  | da55dbc940c54f8ca2f069b31563e0b4 | UPDATE_COMPLETE | 2018-11-12T04:31:08Z | 2018-11-12T17:56:57Z | 
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ 


[stack@undercloud (stackrc) ~]$ openstack workflow execution show 39b85ffd-de65-499f-ab53-9c4677c72f7d -f yaml                                                                                                                                                                                                               
ID: 39b85ffd-de65-499f-ab53-9c4677c72f7d
Workflow ID: 4ff89508-2f97-41c8-92fb-6f1490e1ec0e
Workflow name: tripleo.scale.v1.delete_node
Workflow namespace: ''
Description: ''
Task Execution ID: <none>
Root Execution ID: <none>
State: ERROR
State info: "Failed to run task [error=Can not evaluate YAQL expression [expression=$.status,\
  \ error=u'status', data={}], wf=tripleo.scale.v1.delete_node, task=send_message]:\n\
  Traceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py\"\
  , line 63, in run_task\n    task.run()\n  File \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\"\
  , line 159, in wrapper\n    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 390, in run\n    self._run_new()\n  File \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\"\
  , line 159, in wrapper\n    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 419, in _run_new\n    self._schedule_actions()\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 483, in _schedule_actions\n    input_dict = self._get_action_input()\n  File\
  \ \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\", line 159, in wrapper\n\
  \    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 514, in _get_action_input\n    input_dict = self._evaluate_expression(self.task_spec.get_input(),\
  \ ctx)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\", line\
  \ 540, in _evaluate_expression\n    ctx_view\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\"\
  , line 100, in evaluate_recursively\n    data[key] = _evaluate_item(data[key], context)\n\
  \  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\", line\
  \ 79, in _evaluate_item\n    return evaluate(item, context)\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\"\
  , line 71, in evaluate\n    return evaluator.evaluate(expression, context)\n  File\
  \ \"/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py\", line\
  \ 159, in evaluate\n    cls).evaluate(trim_expr, data_context)\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py\"\
  , line 113, in evaluate\n    \", data=%s]\" % (expression, str(e), data_context)\n\
  YaqlEvaluationException: Can not evaluate YAQL expression [expression=$.status,\
  \ error=u'status', data={}]\n"
Created at: '2018-11-12 17:54:00'
Updated at: '2018-11-12 18:04:08'





rpms installed:
[stack@undercloud (stackrc) ~]$ rpm -qa | grep openstack-tripleo              
openstack-tripleo-puppet-elements-9.0.0-0.20181007201103.daf9069.el7.noarch   
openstack-tripleo-image-elements-9.0.1-0.20181007200834.2dc678a.el7.noarch    
openstack-tripleo-common-containers-10.0.1-0.20181112071049.b8bfff8.el7.noarch
openstack-tripleo-validations-9.3.1-0.20181008110747.4064fb7.el7.noarch       
openstack-tripleo-heat-templates-9.0.1-0.20181013060858.ffbe879.el7.noarch    
openstack-tripleo-ui-9.3.1-0.20180921180340.df30b55.el7.noarch                
openstack-tripleo-common-10.0.1-0.20181112071049.b8bfff8.el7.noarch

Comment 15 Jim Bagwell 2018-11-12 18:28:38 UTC
Regarding comment #12 - I attempted to install using a later version of the openstack-tripleo-common rpm and did not achieve the result shown in comment #12.



Stack trace still can be seen when performing a scale-in:

Command:
openstack overcloud node delete <uuid>

Symptoms:
Command continues to hang indefinitely

open stack stack list shows it was successful after 10 minutes, but command never returns. 

[stack@undercloud (stackrc) ~]$ openstack stack list                                                                                                     
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ 
| ID                                   | Stack Name | Project                          | Stack Status    | Creation Time        | Updated Time         | 
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ 
| 00462a04-88ff-4558-a8e0-d86f16f39241 | overcloud  | da55dbc940c54f8ca2f069b31563e0b4 | UPDATE_COMPLETE | 2018-11-12T04:31:08Z | 2018-11-12T17:56:57Z | 
+--------------------------------------+------------+----------------------------------+-----------------+----------------------+----------------------+ 


[stack@undercloud (stackrc) ~]$ openstack workflow execution show 39b85ffd-de65-499f-ab53-9c4677c72f7d -f yaml                                                                                                                                                                                                               
ID: 39b85ffd-de65-499f-ab53-9c4677c72f7d
Workflow ID: 4ff89508-2f97-41c8-92fb-6f1490e1ec0e
Workflow name: tripleo.scale.v1.delete_node
Workflow namespace: ''
Description: ''
Task Execution ID: <none>
Root Execution ID: <none>
State: ERROR
State info: "Failed to run task [error=Can not evaluate YAQL expression [expression=$.status,\
  \ error=u'status', data={}], wf=tripleo.scale.v1.delete_node, task=send_message]:\n\
  Traceback (most recent call last):\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py\"\
  , line 63, in run_task\n    task.run()\n  File \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\"\
  , line 159, in wrapper\n    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 390, in run\n    self._run_new()\n  File \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\"\
  , line 159, in wrapper\n    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 419, in _run_new\n    self._schedule_actions()\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 483, in _schedule_actions\n    input_dict = self._get_action_input()\n  File\
  \ \"/usr/lib/python2.7/site-packages/osprofiler/profiler.py\", line 159, in wrapper\n\
  \    result = f(*args, **kwargs)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\"\
  , line 514, in _get_action_input\n    input_dict = self._evaluate_expression(self.task_spec.get_input(),\
  \ ctx)\n  File \"/usr/lib/python2.7/site-packages/mistral/engine/tasks.py\", line\
  \ 540, in _evaluate_expression\n    ctx_view\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\"\
  , line 100, in evaluate_recursively\n    data[key] = _evaluate_item(data[key], context)\n\
  \  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\", line\
  \ 79, in _evaluate_item\n    return evaluate(item, context)\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py\"\
  , line 71, in evaluate\n    return evaluator.evaluate(expression, context)\n  File\
  \ \"/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py\", line\
  \ 159, in evaluate\n    cls).evaluate(trim_expr, data_context)\n  File \"/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py\"\
  , line 113, in evaluate\n    \", data=%s]\" % (expression, str(e), data_context)\n\
  YaqlEvaluationException: Can not evaluate YAQL expression [expression=$.status,\
  \ error=u'status', data={}]\n"
Created at: '2018-11-12 17:54:00'
Updated at: '2018-11-12 18:04:08'





rpms installed:
[stack@undercloud (stackrc) ~]$ rpm -qa | grep openstack-tripleo              
openstack-tripleo-puppet-elements-9.0.0-0.20181007201103.daf9069.el7.noarch   
openstack-tripleo-image-elements-9.0.1-0.20181007200834.2dc678a.el7.noarch    
openstack-tripleo-common-containers-10.0.1-0.20181112071049.b8bfff8.el7.noarch
openstack-tripleo-validations-9.3.1-0.20181008110747.4064fb7.el7.noarch       
openstack-tripleo-heat-templates-9.0.1-0.20181013060858.ffbe879.el7.noarch    
openstack-tripleo-ui-9.3.1-0.20180921180340.df30b55.el7.noarch                
openstack-tripleo-common-10.0.1-0.20181112071049.b8bfff8.el7.noarch

Comment 16 Marius Cornea 2018-11-13 14:11:53 UTC
(In reply to Jim Bagwell from comment #15)
> Regarding comment #12 - I attempted to install using a later version of the
> openstack-tripleo-common rpm and did not achieve the result shown in comment
> #12.
> 

Jim, this bug has already been verified by QE team. Please open a new BZ providing the details for the failure you're seeing.

Comment 18 errata-xmlrpc 2019-01-11 11:53:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2019:0045


Note You need to log in before you can comment on or make changes to this bug.