Description of problem: When running an ansible automation inside job which runs longer than a minute cloudforms thinks after waiting for about 60 retries that the job failed Server [cfmemgmt] Service [rhmgmt] Provision Step [check_completed] Status Version-Release number of selected component (if applicable): 5.8.0.17.20170525183055_6317a22 How reproducible: Steps to Reproduce: 1. Take a job which runs longer than 1 minute 2. Run it in ansible inside and wait until cfme errors out with Server [cfmemgmt] Service [rhmgmt] Provision Step [check_completed] Status Actual results: Playbook run is successful but cfme says the job failed Expected results: CloudForms should wait until the job is finished instead of failing when retries above 60 Additional info:
A feature was added to support extending the timeout value for Playbook runs. PRs https://github.com/ManageIQ/manageiq-ui-classic/pull/1742 https://github.com/ManageIQ/manageiq-content/pull/148
The information in Comment 3 is for an enhancement that allows the Ansible Service designer to specify a time(ttl), in minutes to allow an Ansible playbook to complete. The time(ttl) and state machine max retries are used to calculate the ae_retry_interval. The Ansible Service state machine has an issue where the retry interval is not set in the state machine, so the queued retry doesnt wait the minute before calling back into the state machine. This PR resolves that issue: https://github.com/ManageIQ/manageiq-content/pull/163 New commit detected on ManageIQ/manageiq-content/master: https://github.com/ManageIQ/manageiq-content/commit/6164edd9f90274c362e60332d78874b97b8c2507 commit 6164edd9f90274c362e60332d78874b97b8c2507 Author: william fitzgerald <wfitzger> AuthorDate: Fri Aug 4 10:53:44 2017 -0400 Commit: william fitzgerald <wfitzger> CommitDate: Fri Aug 4 10:53:44 2017 -0400 Set default retry interval to 1 minute for generic service state-machine. Set ae_retry_interval to 1 minute in check_completed method. https://bugzilla.redhat.com/show_bug.cgi?id=1478200 @miq-bot add_label bug, services, fine/yes @miq-bot assign @gmcculloug .../StateMachines/GenericLifecycle.class/__methods__/check_completed.rb | 1 + 1 file changed, 1 insertion(+)
New commit detected on ManageIQ/manageiq-content/master: https://github.com/ManageIQ/manageiq-content/commit/220fa1e8a63b253d0fcf6ef7c4f3a341d470e52f commit 220fa1e8a63b253d0fcf6ef7c4f3a341d470e52f Author: william fitzgerald <wfitzger> AuthorDate: Fri Aug 4 10:04:39 2017 -0400 Commit: william fitzgerald <wfitzger> CommitDate: Mon Aug 7 12:44:37 2017 -0400 Support TTL (Time To Live) value for services. Allow specifying the length of time to allow the check_completed step to run in the generic service state-machine. This will allow long running processes to finish. UI changes done. https://www.pivotaltracker.com/n/projects/1937537/stories/147361947 https://bugzilla.redhat.com/show_bug.cgi?id=1459735 Initial PR: https://github.com/ManageIQ/manageiq-content/pull/148 Moved interval calculations from start to check_completed. .../__methods__/check_completed.rb | 20 ++++- .../__methods__/check_completed_spec.rb | 95 ++++++++++++++++++++-- 2 files changed, 108 insertions(+), 7 deletions(-)
*** This bug has been marked as a duplicate of bug 1492274 ***