Bug 1667894

Summary: Upgrade prepare command hangs until timeout is reached
Product: Red Hat OpenStack Reporter: Jose Luis Franco <jfrancoa>
Component: openstack-tripleo-commonAssignee: mathieu bultel <mbultel>
Status: CLOSED ERRATA QA Contact: Alexander Chuzhoy <sasha>
Severity: medium Docs Contact:
Priority: medium    
Version: 14.0 (Rocky)CC: lbezdick, mburns, slinaber
Target Milestone: z3Keywords: TestOnly, Triaged, ZStream
Target Release: 14.0 (Rocky)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-common-9.5.0-1.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-07-02 19:44:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Jose Luis Franco 2019-01-21 11:02:38 UTC
Description of problem:

When performing an upgrade from OSP13 to OSP14 and some of the templates is in wrong state (nic adjustment step was skipped, or some parameter is undefined) the "overcloud upgrade prepare" command hangs displaying nothing until a timeout for the command is reached. To find out what the issue is, you need to re-run the command with --debug option.

    (undercloud) [stack@undercloud-0 ~]$ sh overcloud_upgrade.sh
    2019-01-17 09:01:43.872 658416 INFO osc_lib.shell [-] command: overcloud upgrade prepare -> tripleoclient.v1.overcloud_upgrade.UpgradePrepare (auth=True)
    2019-01-17 09:01:43.874 658416 INFO osc_lib.clientmanager [-] Using auth plugin: password
    2019-01-17 09:01:43.875 658416 DEBUG osc_lib.clientmanager [-] Using parameters {'username': 'admin', 'project_name': 'admin', 'user_domain_name': 'Default', 'auth_url': 'ht
    tp://192.168.24.1:5000/', 'password': '***', 'project_domain_name': 'Default'} setup_auth /usr/lib/python2.7/site-packages/osc_lib/clientmanager.py:157
    2019-01-17 09:01:43.881 658416 DEBUG osc_lib.clientmanager [-] Get auth_ref auth_ref /usr/lib/python2.7/site-packages/osc_lib/clientmanager.py:201
    2019-01-17 09:01:52.424 658416 INFO tripleoclient.v1.overcloud_upgrade.MajorUpgradePrepare [-] Stack found, will be doing a stack update
    Removing the current plan files
    Uploading new plan files
    Temporary Swift GET/PUT URL parameters have successfully been updated.
    Temporary Swift GET/PUT URL parameters have successfully been updated.
    The backup of the ceph-ansible fetch directory did not need to be renamed
    Plan updated.
    Processing templates in the directory /tmp/tripleoclient-jmT7Zs/tripleo-heat-templates
    WARNING: Following parameter(s) are deprecated and still defined. Deprecated parameters will be removed soon!
      OvercloudControlFlavor
    WARNING: Following parameter(s) are defined but not used in plan. Could be possible that parameter is valid but currently not used.
      CephAnsiblePlaybookVerbosity
      DockerOvnSbDbImage
      SwiftFetchDirPutTempurl
      DockerMysqlClientConfigImage
      NeutronExternalNetworkBridge
      DockerOvnNorthdImage
      NeutronTunnelTypes
      SwiftFetchDirGetTempurl
      DockerOvnNbDbImage
      NeutronEnableDHCPAgent
    2019-01-17 09:07:29.520 658416 WARNING tripleoclient.plugin [-] Waiting for messages on queue 'tripleo' with no timeout.
    ^C2019-01-17 09:43:02.029 658416 INFO osc_lib.shell [-] END return value: 1
    2019-01-17 09:43:02.033 658416 CRITICAL root [-] Unhandled error: KeyboardInterrupt


When checking the mistral-executor logs we can see the reason of the failure:

 ERROR: Property error: : resources.Compute<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_1a7f4307c5ed43ca8de796826cbf7704/overcloud/puppet/compute-role.yaml>.resources.NetworkConfig.properties: : Unknown Property StorageMgmtInterfaceRoutes: HTTPBadRequest: ERROR: Property error: : resources.Compute<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_1a7f4307c5ed43ca8de796826cbf7704/overcloud/puppet/compute-role.yaml>.resources.NetworkConfig.properties: : Unknown Property StorageMgmtInterfaceRoutes
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor Traceback (most recent call last):
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/mistral/executors/default_executor.py", line 114, in run_action
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     result = action.run(action_ctx)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/tripleo_common/actions/package_update.py", line 100, in run
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     return heat.stacks.update(stack.id, **stack_args)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/heatclient/v1/stacks.py", line 183, in update
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     headers=headers)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 295, in put
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     return self.client_request("PUT", url, **kwargs)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 282, in client_request
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     resp, body = self.json_request(method, url, **kwargs)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 271, in json_request
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     resp = self._http_request(url, method, **kwargs)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 243, in _http_request
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     resp = self._http_request(location, method, **kwargs)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor   File "/usr/lib/python2.7/site-packages/heatclient/common/http.py", line 234, in _http_request
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor     raise exc.from_response(resp)
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor HTTPBadRequest: ERROR: Property error: : resources.Compute<nested_stack>.resources.0<https://192.168.24.2:13808/v1/AUTH_1a7f4307c5ed43ca8de796826cbf7704/overcloud/puppet/compute-role.yaml>.resources.NetworkConfig.properties: : Unknown Property StorageMgmtInterfaceRoutes
2019-01-17 09:08:48.005 1 ERROR mistral.executors.default_executor

And this other error is displayed in mistral_executor logs:

 
2019-01-17 09:08:48.202 1 WARNING mistral.utils.expression_utils [req-9099631d-09d7-49be-874e-8e4d708b78fb 3b9703cd2c3e4efcaf83a6c670c23d54 1a7f4307c5ed43ca8de796826cbf7704 - default default] Task 'update' not found by the task() expression function
2019-01-17 09:08:48.205 1 ERROR mistral.engine.task_handler [req-9099631d-09d7-49be-874e-8e4d708b78fb 3b9703cd2c3e4efcaf83a6c670c23d54 1a7f4307c5ed43ca8de796826cbf7704 - default default] Failed to handle action completion [error=Can not evaluate YAQL expression [expression=task(update).result, error=Unknown function "#property#result", data={}], wf=tripleo.package_update.v1.package_update_plan, task=set_update_failed, action=std.noop]:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/mistral/engine/task_handler.py", line 110, in _on_action_complete
    task.on_action_complete(action_ex)
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 381, in on_action_complete
    self.complete(state, state_info)
  File "/usr/lib/python2.7/site-packages/osprofiler/profiler.py", line 159, in wrapper
    result = f(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/mistral/engine/tasks.py", line 221, in complete
    data_flow.publish_variables(self.task_ex, self.task_spec)
  File "/usr/lib/python2.7/site-packages/mistral/workflow/data_flow.py", line 215, in publish_variables
    task_ex.published = expr.evaluate_recursively(branch_vars, expr_ctx)
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 100, in evaluate_recursively
    data[key] = _evaluate_item(data[key], context)
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 79, in _evaluate_item
    return evaluate(item, context)
  File "/usr/lib/python2.7/site-packages/mistral/expressions/__init__.py", line 71, in evaluate
    return evaluator.evaluate(expression, context)
  File "/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py", line 159, in evaluate
    cls).evaluate(trim_expr, data_context)
  File "/usr/lib/python2.7/site-packages/mistral/expressions/yaql_expression.py", line 113, in evaluate
    ", data=%s]" % (expression, str(e), data_context)
YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(update).result, error=Unknown function "#property#result", data={}]
: YaqlEvaluationException: Can not evaluate YAQL expression [expression=task(update).result, error=Unknown function "#property#result", data={}]
2019-01-17 09:08:48.210 1 INFO workflow_trace [req-9099631d-09d7-49be-874e-8e4d708b78fb 3b9703cd2c3e4efcaf83a6c670c23d54 1a7f4307c5ed43ca8de796826cbf7704 - default default] Task 'set_update_failed' (6dc8c626-695e-412c-8925-301ac62e9920) [SUCCESS -> ERROR, msg=Failed to handle action completion [error=Can not evaluate YAQL expression [expression=task(update).result, error=Unknown function "#property#result", data={}], wf=tripleo.package_update.v1.package_update_plan, task=set_update_failed, action=std.noop]:

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Upgrade the undercloud to OSP14
2. Skip the nic adjustment step for the overcloud roles
3. Run "overcloud upgrade prepare" command

Actual results:

The command hangs giving no clue on what the issue is

Expected results:

The command fails right away displaying where the error occurred.

Additional info:

Comment 4 Jason Joyce 2019-06-07 18:02:11 UTC
According to our records, this should be resolved by openstack-tripleo-common-9.5.0-2.el7ost.  This build is available now.

Comment 7 errata-xmlrpc 2019-07-02 19:44:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2019:1683