Bug 1459735 - Ansible inside Job times out even if the playbook is still running
Summary: Ansible inside Job times out even if the playbook is still running
Keywords:
Status: CLOSED DUPLICATE of bug 1492274
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Automate
Version: 5.8.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.9.0
Assignee: Greg McCullough
QA Contact: Dmitry Misharov
URL:
Whiteboard:
Depends On:
Blocks: 1479407
TreeView+ depends on / blocked
 
Reported: 2017-06-08 02:32 UTC by ldomb
Modified: 2017-11-01 14:46 UTC (History)
8 users (show)

Fixed In Version: 5.9.0.1
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1479407 (view as bug list)
Environment:
Last Closed: 2017-11-01 13:16:08 UTC
Category: ---
Cloudforms Team: ---
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description ldomb 2017-06-08 02:32:34 UTC
Description of problem:
When running an ansible automation inside job which runs longer than a minute cloudforms thinks after waiting for about 60 retries that the job failed

Server [cfmemgmt] Service [rhmgmt] Provision Step [check_completed] Status 

Version-Release number of selected component (if applicable):
5.8.0.17.20170525183055_6317a22

How reproducible:


Steps to Reproduce:
1. Take a job which runs longer than 1 minute 
2. Run it in ansible inside and wait until cfme errors out with Server [cfmemgmt] Service [rhmgmt] Provision Step [check_completed] Status 


Actual results:
Playbook run is successful but cfme says the job failed

Expected results:
CloudForms should wait until the job is finished instead of failing when retries above 60


Additional info:

Comment 3 Greg McCullough 2017-08-01 15:25:17 UTC
A feature was added to support extending the timeout value for Playbook runs.

PRs
https://github.com/ManageIQ/manageiq-ui-classic/pull/1742
https://github.com/ManageIQ/manageiq-content/pull/148

Comment 4 Tina Fitzgerald 2017-08-04 20:21:19 UTC
The information in Comment 3 is for an enhancement that allows the Ansible Service designer to specify a time(ttl), in minutes to allow an Ansible playbook to complete. The time(ttl) and state machine max retries are used to calculate the ae_retry_interval.

The Ansible Service state machine has an issue where the retry interval is not set in the state machine, so the queued retry doesnt wait the minute before calling back into the state machine.

This PR resolves that issue:
https://github.com/ManageIQ/manageiq-content/pull/163

New commit detected on ManageIQ/manageiq-content/master:
https://github.com/ManageIQ/manageiq-content/commit/6164edd9f90274c362e60332d78874b97b8c2507

commit 6164edd9f90274c362e60332d78874b97b8c2507
Author:     william fitzgerald <wfitzger>
AuthorDate: Fri Aug 4 10:53:44 2017 -0400
Commit:     william fitzgerald <wfitzger>
CommitDate: Fri Aug 4 10:53:44 2017 -0400

    Set default retry interval to 1 minute for generic service state-machine.
    
    Set ae_retry_interval to 1 minute in check_completed method.
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1478200
    
    @miq-bot add_label bug, services, fine/yes
    @miq-bot assign @gmcculloug

 .../StateMachines/GenericLifecycle.class/__methods__/check_completed.rb  | 1 +
 1 file changed, 1 insertion(+)

Comment 6 CFME Bot 2017-08-09 14:43:18 UTC
New commit detected on ManageIQ/manageiq-content/master:
https://github.com/ManageIQ/manageiq-content/commit/220fa1e8a63b253d0fcf6ef7c4f3a341d470e52f

commit 220fa1e8a63b253d0fcf6ef7c4f3a341d470e52f
Author:     william fitzgerald <wfitzger>
AuthorDate: Fri Aug 4 10:04:39 2017 -0400
Commit:     william fitzgerald <wfitzger>
CommitDate: Mon Aug 7 12:44:37 2017 -0400

    Support TTL (Time To Live) value for services.
    
    Allow specifying the length of time to allow the check_completed step to run in the generic service state-machine.
    This will allow long running processes to finish.
    
    UI changes done.
    
    https://www.pivotaltracker.com/n/projects/1937537/stories/147361947
    https://bugzilla.redhat.com/show_bug.cgi?id=1459735
    
    Initial PR: https://github.com/ManageIQ/manageiq-content/pull/148
    Moved interval calculations from start to check_completed.

 .../__methods__/check_completed.rb                 | 20 ++++-
 .../__methods__/check_completed_spec.rb            | 95 ++++++++++++++++++++--
 2 files changed, 108 insertions(+), 7 deletions(-)

Comment 7 Dmitry Misharov 2017-11-01 13:16:08 UTC

*** This bug has been marked as a duplicate of bug 1492274 ***


Note You need to log in before you can comment on or make changes to this bug.