Bug 1732900 - OSP 10->13 FFU, during ffwd-upgrade converge step, WorkflowTasks_Step5_Execution is stuck in CREATE_IN_PROGRESS
Summary: OSP 10->13 FFU, during ffwd-upgrade converge step, WorkflowTasks_Step5_Execut...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 13.0 (Queens)
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: RHOS Maint
QA Contact: Sasha Smolyak
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-24 16:29 UTC by Matt Flusche
Modified: 2023-09-07 20:21 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-07-24 18:47:58 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OSP-28248 0 None None None 2023-09-07 20:21:41 UTC

Description Matt Flusche 2019-07-24 16:29:43 UTC
Description of problem:


Background:  This upgrade included adding octavia.  Initially this failure was due to a template issue with the public endpoint TLS configuration (missing tls-endpoints-public-dns.yaml environment file).  This caused the failure during WorkflowTasks_Step5_Execution because the public endpoints were not correctly defined and the octavia amphora image upload failed.  However, after resolving the endpoint issue, subsequent converge deployments fail/hang on this WorkflowTasks_Step5_Execution resource.  The mistral workflow associated with this resource never gets re-executed (mistral: tripleo.octavia_post.v1.octavia_post_deploy).

I can reproduce the initial failure in a lab; however, once I resolve the template issue the deployment will complete successfully. The issue seems to be isolated to this environment.

I will provide additional details and logs in a private comment.

My thoughts on how to proceed (looking for feedback here):

- backup the heat database on the undercloud.
- delete the AllNodesDeploySteps nested stack and mark the resource as unhealthy 

  heat stack-delete <uuid_for_AllNodesDeploySteps_nested_stack>
  heat resource-mark-unhealthy overcloud  <uuid_for_AllNodesDeploySteps_nested_stack>

- run the upgrade converge step again.

Comment 4 Matt Flusche 2019-07-24 18:47:58 UTC
This was resolved by restarting heat-engine on the overcloud.  Sorry for the noise.

Comment 5 Matt Flusche 2019-07-24 18:49:50 UTC
I mean restarting heat-engine on the undercloud :)


Note You need to log in before you can comment on or make changes to this bug.