Description of problem: Create or use Ansible Playbook which has 60mins or more sleep time. After ordering service, at the end it shows error in EVM.log -> "restarting worker after [3600] job aborting ..." Version-Release number of selected component (if applicable): 5.11.0.23.20190904213640_d113674 How reproducible: 100% Steps to Reproduce: 1. Enable Ansible Server role. 2. Create or use attached playbook 3. Create Ansible service 4. Wait for 60mins Actual results: job aborting, ansible playbook has been running longer than timeout Expected results: Service should should not fail. Also restart the worker if timeout goes beyond. Additional info: Found this BZ while verifying https://bugzilla.redhat.com/show_bug.cgi?id=1660803
Billy is going to look into this. We looked at the reproducer environment and max_ttl is not specified. Max_ttl should be specified when playbooks are expected to take some time. The max_ttl(maximum time to live) is used to calculate a retry interval for the playbook service state machine. The retry interval defaults to 1 minute when this value is not specified.
Worked on latest 5.10 but failing on 5.11.0.23. Job ran for about an hour and then after 58 retires this error was encountered "job aborting, ansible playbook has been running longer than timeout" Error is coming from ansible_runner_workflow.rb line 61. Lucy is looking into this now
https://github.com/ManageIQ/manageiq/pull/19279
New commit detected on ManageIQ/manageiq/master: https://github.com/ManageIQ/manageiq/commit/861cc9990fac39fd0edf3841c9accde4710fcf35 commit 861cc9990fac39fd0edf3841c9accde4710fcf35 Author: Lucy Fu <lufu> AuthorDate: Tue Sep 10 10:00:17 2019 -0400 Commit: Lucy Fu <lufu> CommitDate: Tue Sep 10 10:00:17 2019 -0400 Set default timeout to 100 minutes for playbook service and playbook method. https://bugzilla.redhat.com/show_bug.cgi?id=1750370 app/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script.rb | 5 +- app/models/manageiq/providers/embedded_ansible/automation_manager/playbook_runner.rb | 2 +- spec/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script_spec.rb | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-)
New commit detected on ManageIQ/manageiq/ivanchuk: https://github.com/ManageIQ/manageiq/commit/55bc9be7150764c2fa69037d781308bedc7c8327 commit 55bc9be7150764c2fa69037d781308bedc7c8327 Author: Jason Frey <jfrey> AuthorDate: Wed Sep 11 12:25:56 2019 -0400 Commit: Jason Frey <jfrey> CommitDate: Wed Sep 11 12:25:56 2019 -0400 Merge pull request #19279 from lfu/ansible_runner_timeout_1750370 Set default playbook service timeout to 100 minutes (cherry picked from commit e1e730fb136f2702a02b04f815eb93f94934a426) https://bugzilla.redhat.com/show_bug.cgi?id=1750370 app/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script.rb | 5 +- app/models/manageiq/providers/embedded_ansible/automation_manager/playbook_runner.rb | 2 +- spec/models/manageiq/providers/embedded_ansible/automation_manager/configuration_script_spec.rb | 2 +- 3 files changed, 5 insertions(+), 4 deletions(-)
Playbook having sleep time 60mins executed without any fail. No errors occurred. Verified Version: 5.11.0.24.20190911182429_55bc9be