Bug 1830173

Summary: Running enable-ssh-admin.sh for pre-provisoned nodes will loop forever if workflow fails
Product: Red Hat OpenStack Reporter: David Sedgmen <dsedgmen>
Component: openstack-tripleo-heat-templatesAssignee: Emilien Macchi <emacchi>
Status: CLOSED ERRATA QA Contact: David Rosenfeld <drosenfe>
Severity: high Docs Contact:
Priority: high    
Version: 16.1 (Train)CC: abishop, amcleod, emacchi, jschluet, mburns, ramishra
Target Milestone: z3Keywords: Triaged, ZStream
Target Release: 16.1 (Train on RHEL 8.2)   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: openstack-tripleo-common-11.4.1-1.20200917023444.el8ost openstack-tripleo-heat-templates-11.3.2-1.20200914170158.el8ost Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-12-15 18:35:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description David Sedgmen 2020-05-01 01:04:32 UTC
Description of problem: If enable-ssh-admin.sh workflow fails the script will loop forever, because it only checks if the workflow is SUCCESSFUL 


########################
function workflow_finished {
    local execution_id="$1"
    openstack workflow execution show -f shell $execution_id | grep 'state="SUCCESS"' > /dev/null
}
........................
echo -n "Waiting for the workflow execution to finish (id $EXECUTION_ID)."
while ! workflow_finished $EXECUTION_ID; do
    sleep $SLEEP_TIME
    echo -n .
done
########################

Actual results:

Stuck in a loop that never times out

Expected results:

To exit if the workflow fails or time out

Comment 5 Alex McLeod 2020-06-16 12:34:04 UTC
If this bug requires doc text for errata release, please set the 'Doc Type' and provide draft text according to the template in the 'Doc Text' field. The documentation team will review, edit, and approve the text.

If this bug does not require doc text, please set the 'requires_doc_text' flag to '-'.

Comment 6 David Rosenfeld 2020-06-16 13:18:17 UTC
When I tried to verify BZ the heat_agent.log contained these msgs:

No recognized column names in ['state']. Recognized columns are ['ID', 'Workflow ID', 'Workflow name', 'Workflow namespace', 'Description', 'Task Execution ID', 'Root Execution ID', 'State', 'State info', 'Created at', 'Updated at'].

enable-ssh-admin.sh is executing this command:

openstack workflow execution show -f value -c state $execution_id

Believe it needs to look for a capital S instead of lowercase s in State.

Comment 10 David Rosenfeld 2020-09-21 14:47:41 UTC
Moving to ON_DEV. When trying to verify this error message was seen in heat_agent.log:

Waiting for the workflow execution to finish (id e862c392-1470-4367-a0d1-50a1b43c6bd8).Workflow e862c392-1470-4367-a0d1-50a1b43c6bd8 finished with error. Check mistral logs.

Comment 27 David Rosenfeld 2020-11-16 14:16:08 UTC
This is seen in heat_agent.log when timeout occurs: Workflow e4d6b815-880d-4aab-9df6-3fd6099b0221 did not finish after 600 seconds.

In addition the errors: 

No recognized column names in ['state']. Recognized columns are ['ID', 'Workflow ID', 'Workflow name', 'Workflow namespace', 'Description', 'Task Execution ID', 'Root Execution ID', 'State', 'State info', 'Created at', 'Updated at'].

and 

Waiting for the workflow execution to finish (id e862c392-1470-4367-a0d1-50a1b43c6bd8).Workflow e862c392-1470-4367-a0d1-50a1b43c6bd8 finished with error. Check mistral logs.

are no longer seen in heat_agent.log

Comment 35 errata-xmlrpc 2020-12-15 18:35:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.3 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:5413