Bug 1459735

Summary: Ansible inside Job times out even if the playbook is still running
Product: Red Hat CloudForms Management Engine Reporter: ldomb
Component: AutomateAssignee: Greg McCullough <gmccullo>
Status: CLOSED DUPLICATE QA Contact: Dmitry Misharov <dmisharo>
Severity: high Docs Contact:
Priority: high    
Version: 5.8.0CC: cpelland, dmisharo, jhardy, jmarc, kmorey, mkanoor, obarenbo, tfitzger
Target Milestone: GAKeywords: TestOnly, ZStream
Target Release: 5.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 5.9.0.1 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1479407 (view as bug list) Environment:
Last Closed: 2017-11-01 13:16:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1479407    

Description ldomb 2017-06-08 02:32:34 UTC
Description of problem:
When running an ansible automation inside job which runs longer than a minute cloudforms thinks after waiting for about 60 retries that the job failed

Server [cfmemgmt] Service [rhmgmt] Provision Step [check_completed] Status 

Version-Release number of selected component (if applicable):
5.8.0.17.20170525183055_6317a22

How reproducible:


Steps to Reproduce:
1. Take a job which runs longer than 1 minute 
2. Run it in ansible inside and wait until cfme errors out with Server [cfmemgmt] Service [rhmgmt] Provision Step [check_completed] Status 


Actual results:
Playbook run is successful but cfme says the job failed

Expected results:
CloudForms should wait until the job is finished instead of failing when retries above 60


Additional info:

Comment 3 Greg McCullough 2017-08-01 15:25:17 UTC
A feature was added to support extending the timeout value for Playbook runs.

PRs
https://github.com/ManageIQ/manageiq-ui-classic/pull/1742
https://github.com/ManageIQ/manageiq-content/pull/148

Comment 4 Tina Fitzgerald 2017-08-04 20:21:19 UTC
The information in Comment 3 is for an enhancement that allows the Ansible Service designer to specify a time(ttl), in minutes to allow an Ansible playbook to complete. The time(ttl) and state machine max retries are used to calculate the ae_retry_interval.

The Ansible Service state machine has an issue where the retry interval is not set in the state machine, so the queued retry doesnt wait the minute before calling back into the state machine.

This PR resolves that issue:
https://github.com/ManageIQ/manageiq-content/pull/163

New commit detected on ManageIQ/manageiq-content/master:
https://github.com/ManageIQ/manageiq-content/commit/6164edd9f90274c362e60332d78874b97b8c2507

commit 6164edd9f90274c362e60332d78874b97b8c2507
Author:     william fitzgerald <wfitzger>
AuthorDate: Fri Aug 4 10:53:44 2017 -0400
Commit:     william fitzgerald <wfitzger>
CommitDate: Fri Aug 4 10:53:44 2017 -0400

    Set default retry interval to 1 minute for generic service state-machine.
    
    Set ae_retry_interval to 1 minute in check_completed method.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1478200
    
    @miq-bot add_label bug, services, fine/yes
    @miq-bot assign @gmcculloug

 .../StateMachines/GenericLifecycle.class/__methods__/check_completed.rb  | 1 +
 1 file changed, 1 insertion(+)

Comment 6 CFME Bot 2017-08-09 14:43:18 UTC
New commit detected on ManageIQ/manageiq-content/master:
https://github.com/ManageIQ/manageiq-content/commit/220fa1e8a63b253d0fcf6ef7c4f3a341d470e52f

commit 220fa1e8a63b253d0fcf6ef7c4f3a341d470e52f
Author:     william fitzgerald <wfitzger>
AuthorDate: Fri Aug 4 10:04:39 2017 -0400
Commit:     william fitzgerald <wfitzger>
CommitDate: Mon Aug 7 12:44:37 2017 -0400

    Support TTL (Time To Live) value for services.
    
    Allow specifying the length of time to allow the check_completed step to run in the generic service state-machine.
    This will allow long running processes to finish.
    
    UI changes done.
    
    https://www.pivotaltracker.com/n/projects/1937537/stories/147361947
    https://bugzilla.redhat.com/show_bug.cgi?id=1459735
    
    Initial PR: https://github.com/ManageIQ/manageiq-content/pull/148
    Moved interval calculations from start to check_completed.

 .../__methods__/check_completed.rb                 | 20 ++++-
 .../__methods__/check_completed_spec.rb            | 95 ++++++++++++++++++++--
 2 files changed, 108 insertions(+), 7 deletions(-)

Comment 7 Dmitry Misharov 2017-11-01 13:16:08 UTC

*** This bug has been marked as a duplicate of bug 1492274 ***