Bug 1750370 - Ordered Ansible Service is getting timeout and not restarting after 3600 seconds.
Summary: Ordered Ansible Service is getting timeout and not restarting after 3600 seco...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat CloudForms Management Engine
Classification: Red Hat
Component: Automate
Version: 5.11.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: GA
: 5.11.0
Assignee: Lucy Fu
QA Contact: Satyajit Bulage
Red Hat CloudForms Documentation
URL:
Whiteboard:
Depends On:
Blocks: 1660803
TreeView+ depends on / blocked
 
Reported: 2019-09-09 13:20 UTC by Satyajit Bulage
Modified: 2019-12-13 14:54 UTC (History)
9 users (show)

Fixed In Version: 5.11.0.24
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-12-13 14:54:27 UTC
Category: Bug
Cloudforms Team: CFME Core
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Satyajit Bulage 2019-09-09 13:20:03 UTC
Description of problem: Create or use Ansible Playbook which has 60mins or more sleep time. After ordering service, at the end it shows error in EVM.log -> "restarting worker after [3600] job aborting ..."


Version-Release number of selected component (if applicable): 5.11.0.23.20190904213640_d113674


How reproducible: 100%


Steps to Reproduce:
1. Enable Ansible Server role.
2. Create or use attached playbook
3. Create Ansible service
4. Wait for 60mins

Actual results: job aborting, ansible playbook has been running longer than timeout


Expected results: Service should should not fail. Also restart the worker if timeout goes beyond.


Additional info:
Found this BZ while verifying https://bugzilla.redhat.com/show_bug.cgi?id=1660803

Comment 7 Tina Fitzgerald 2019-09-09 15:21:24 UTC
Billy is going to look into this. We looked at the reproducer environment and max_ttl is not specified.
Max_ttl should be specified when playbooks are expected to take some time.  The max_ttl(maximum time to live) is used to calculate a retry interval for the playbook service state machine.  The retry interval defaults to 1 minute when this value is not specified.

Comment 8 William Fitzgerald 2019-09-09 19:26:21 UTC
Worked on latest 5.10 but failing on 5.11.0.23.

Job ran for about an hour and then after 58 retires this error was encountered 
"job aborting, ansible playbook has been running longer than timeout"

Error is coming from ansible_runner_workflow.rb line 61.

Lucy is looking into this now

Comment 10 CFME Bot 2019-09-11 16:26:20 UTC
New commit detected on ManageIQ/manageiq/master:

https://github.com/ManageIQ/manageiq/commit/861cc9990fac39fd0edf3841c9accde4710fcf35
commit 861cc9990fac39fd0edf3841c9accde4710fcf35
Author:     Lucy Fu <lufu>
AuthorDate: Tue Sep 10 10:00:17 2019 -0400
Commit:     Lucy Fu <lufu>
CommitDate: Tue Sep 10 10:00:17 2019 -0400

    Set default timeout to 100 minutes for playbook service and playbook method.

    https://bugzilla.redhat.com/show_bug.cgi?id=1750370

 app/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script.rb | 5 +-
 app/models/manageiq/providers/embedded_ansible/automation_manager/playbook_runner.rb | 2 +-
 spec/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script_spec.rb | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

Comment 11 CFME Bot 2019-09-11 16:41:06 UTC
New commit detected on ManageIQ/manageiq/ivanchuk:

https://github.com/ManageIQ/manageiq/commit/55bc9be7150764c2fa69037d781308bedc7c8327
commit 55bc9be7150764c2fa69037d781308bedc7c8327
Author:     Jason Frey <jfrey>
AuthorDate: Wed Sep 11 12:25:56 2019 -0400
Commit:     Jason Frey <jfrey>
CommitDate: Wed Sep 11 12:25:56 2019 -0400

    Merge pull request #19279 from lfu/ansible_runner_timeout_1750370

    Set default playbook service timeout to 100 minutes

    (cherry picked from commit e1e730fb136f2702a02b04f815eb93f94934a426)

    https://bugzilla.redhat.com/show_bug.cgi?id=1750370

 app/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script.rb | 5 +-
 app/models/manageiq/providers/embedded_ansible/automation_manager/playbook_runner.rb | 2 +-
 spec/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script_spec.rb | 2 +-
 3 files changed, 5 insertions(+), 4 deletions(-)

Comment 12 Satyajit Bulage 2019-09-12 18:19:11 UTC
Playbook having sleep time 60mins executed without any fail. No errors occurred.

Verified Version: 5.11.0.24.20190911182429_55bc9be


Note You need to log in before you can comment on or make changes to this bug.