Bug 1479407 - Ansible inside Job times out even if the playbook is still running
Ansible inside Job times out even if the playbook is still running
Status: CLOSED ERRATA
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Automate (Show other bugs)
5.8.0
Unspecified Unspecified
high Severity high
: GA
: 5.8.2
Assigned To: Tina Fitzgerald
Pavol Kotvan
: ZStream
Depends On: 1492274 1459735
Blocks:
  Show dependency treegraph
 
Reported: 2017-08-08 10:14 EDT by Satoe Imaishi
Modified: 2017-11-01 09:16 EDT (History)
10 users (show)

See Also:
Fixed In Version: 5.8.2.1
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1459735
Environment:
Last Closed: 2017-10-23 20:36:38 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: CFME Core


Attachments (Terms of Use)

  None (edit)
Comment 2 CFME Bot 2017-08-08 10:18:20 EDT
New commit detected on ManageIQ/manageiq-content/fine:
https://github.com/ManageIQ/manageiq-content/commit/e972d62a545db0908eb7f7b9f81e267c1665249e

commit e972d62a545db0908eb7f7b9f81e267c1665249e
Author:     Greg McCullough <gmccullo@redhat.com>
AuthorDate: Tue Jul 25 16:00:53 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Tue Aug 8 10:14:25 2017 -0400

    Merge pull request #148 from billfitzgerald0120/timetolive
    
    Support TTL (Time To Live) value for services.
    (cherry picked from commit 148e5a681762fe62e32cccd0768b6e97d7e696e1)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1479407

 .../GenericLifecycle.class/__methods__/start.rb    |  18 ++++
 .../__methods__/start_spec.rb                      | 105 ++++++++++++++++++---
 2 files changed, 112 insertions(+), 11 deletions(-)
Comment 3 CFME Bot 2017-08-09 14:58:58 EDT
New commit detected on ManageIQ/manageiq-ui-classic/fine:
https://github.com/ManageIQ/manageiq-ui-classic/commit/e5cc5081db97650206cee04b6f445a7c1d749025

commit e5cc5081db97650206cee04b6f445a7c1d749025
Author:     Martin Povolny <mpovolny@redhat.com>
AuthorDate: Thu Jul 27 09:46:47 2017 +0200
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Wed Aug 9 14:57:42 2017 -0400

    Merge pull request #1742 from h-kataria/allow_max_ttl_for_ansible_items
    
    Added input field for max playbook_ttl value
    (cherry picked from commit 41cef52c5f0c0bf2d4322826387529a04083d810)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1479407

 .../controllers/catalog/catalog_item_form_controller.js    |  9 +++++++++
 app/controllers/catalog_controller.rb                      |  2 ++
 app/views/catalog/_sandt_tree_show.html.haml               | 10 ++++++++++
 .../angular/_ansible_form_options_angular.html.haml        | 14 ++++++++++++++
 spec/controllers/catalog_controller_spec.rb                |  8 ++++++++
 .../catalog/catalog_item_form_controller_spec.js           |  1 +
 6 files changed, 44 insertions(+)
Comment 5 Tina Fitzgerald 2017-08-25 10:19:12 EDT
Hi Satoe,

Yes, thanks for catching that. 
https://github.com/ManageIQ/manageiq-content/pull/162 is required to resolve this issue. I added the fine/yes label.

Thanks,
Tina
Comment 6 CFME Bot 2017-09-19 11:01:13 EDT
New commit detected on ManageIQ/manageiq/fine:
https://github.com/ManageIQ/manageiq/commit/5d113efaab9983c8ab012b68f48c322332e967a4

commit 5d113efaab9983c8ab012b68f48c322332e967a4
Author:     Greg McCullough <gmccullo@redhat.com>
AuthorDate: Thu Jul 6 11:16:45 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Tue Sep 19 10:58:03 2017 -0400

    Merge pull request #46 from tinaafitz/expose_max_retries
    
    Add ae_state_max_retries to root object.
    (cherry picked from commit 03e06cd4cf1e99fb63c6cca467e6a08c2dee7dd0)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1479407

 .../engine/miq_ae_engine/miq_ae_state_machine.rb                  | 6 ++++--
 .../engine/miq_ae_state_machine_steps_spec.rb                     | 8 +++++---
 2 files changed, 9 insertions(+), 5 deletions(-)
Comment 7 CFME Bot 2017-09-19 11:03:24 EDT
New commit detected on ManageIQ/manageiq-content/fine:
https://github.com/ManageIQ/manageiq-content/commit/2f66e51c61b866a28865d208f12b82b3d76a5814

commit 2f66e51c61b866a28865d208f12b82b3d76a5814
Author:     Greg McCullough <gmccullo@redhat.com>
AuthorDate: Mon Sep 18 13:51:39 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Tue Sep 19 11:02:19 2017 -0400

    Merge pull request #186 from tinaafitz/revert_pr148
    
    Revert "Support TTL (Time To Live) value for services."
    (cherry picked from commit 546e1a4ea1919f8bb92020cf136ed087ae67f741)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1479407

 .../GenericLifecycle.class/__methods__/start.rb    |  18 ----
 .../__methods__/start_spec.rb                      | 105 +++------------------
 2 files changed, 11 insertions(+), 112 deletions(-)
Comment 8 CFME Bot 2017-09-19 11:03:30 EDT
New commit detected on ManageIQ/manageiq-content/fine:
https://github.com/ManageIQ/manageiq-content/commit/f4c85d78dff0fec31ccc77cc181fc81e9e885b0c

commit f4c85d78dff0fec31ccc77cc181fc81e9e885b0c
Author:     Greg McCullough <gmccullo@redhat.com>
AuthorDate: Wed Aug 9 10:41:49 2017 -0400
Commit:     Satoe Imaishi <simaishi@redhat.com>
CommitDate: Tue Sep 19 11:00:45 2017 -0400

    Merge pull request #162 from billfitzgerald0120/check_completed
    
    Support TTL (Time To Live) value for services.
    (cherry picked from commit 7d66b4be1b38d42d9571cfa9f99872204d026775)
    
    https://bugzilla.redhat.com/show_bug.cgi?id=1479407

 .../__methods__/check_completed.rb                 | 20 ++++-
 .../__methods__/check_completed_spec.rb            | 95 ++++++++++++++++++++--
 2 files changed, 108 insertions(+), 7 deletions(-)
Comment 10 Tina Fitzgerald 2017-09-27 11:18:48 EDT
Hi Pavol,

Your steps are correct with the exception of setting of dialog_param_timeout.
What is the purpose of that dialog parameter?

The retry interval has a minimum value of 1 minute. Your Service catalog item had an execution_ttl=>"1", the Automate method code would take the execution ttl(1) / number of retries (100) and since the value is less than 1 minute, a default of 1 minute is used as show in the MiqQueue.put below: 

Notice the time(minutes) is 09:32 and the deliver time(minutes) is 10:32  

[----] I, [2017-09-27T08:09:32.178777 #4925:65b134]  INFO -- : Q-task_id([service_template_provision_task_1]) MIQ(MiqQueue.put) Message id: [361],  id: [], Zone: [default], Role: [automate], Server: [], Ident: [generic], Target id: [], Instance id: [], Task id: [service_template_provision_task_1], Command: [MiqAeEngine.deliver], Timeout: [3600], Priority: [100], State: [ready], Deliver On: [2017-09-27 12:10:32 UTC], Data: [], Args: [{:object_type=>"ServiceTemplateProvisionTask", :object_id=>1, :namespace=>"Service/Generic/StateMachines", :class_name=>"GenericLifecycle", :instance_name=>"provision", :automate_message=>"create", :attrs=>{"dialog_credential"=>"", "dialog_hosts"=>"localhost", "dialog_param_timeout"=>"360", "request"=>"clone_to_service", :service_action=>"Provision", "Service::Service"=>1}, :user_id=>1, :miq_group_id=>3, :tenant_id=>2, :state=>"check_completed", :ae_fsm_started=>nil, :ae_state_started=>"2017-09-27 12:09:30 UTC", :ae_state_retries=>1, :ae_state_previous=>"---\n\"/ManageIQ/Service/Generic/StateMachines/GenericLifecycle/provision\":\n  ae_state: check_completed\n  ae_state_retries: 1\n  ae_state_started: 2017-09-27 12:09:30 UTC\n"}]

I'd suggest using a max ttl value great than 100, then searching the evm.log for the MiqQueue.puts as shown above and notice the time it is put on the queue and when it will be delivered(Deliver On:) value.

Let me know if you have any questions.

Thanks,
Tina
Comment 12 Tina Fitzgerald 2017-10-02 11:18:40 EDT
Hi Pavol,

TTL is probably not the best name for this attribute. It means how long(in minutes) should the playbook take to complete. The code takes the ttl and divides it by the number of retries to come up with a retry interval. For example, if you expect the playbook to take 300 minutes to complete, you'd specify 300 for ttl. The code divides the 300(ttl) by 100(max_retries) which results in a retry interval of 3 minutes. So, the MiqQueue.put on a retry would specify a deliver on time 3 minutes from the time it was put on the queue. If you specify a ttl value less than 100, the code will use the 1 minute default. 

Let mw know if you have any questions.

Thanks,
Tina
Comment 15 errata-xmlrpc 2017-10-23 20:36:38 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2017:3005

Note You need to log in before you can comment on or make changes to this bug.