Bug 1528960 - Add ability to change maximum timeout for Ansible process executed from engine to finish
Summary: Add ability to change maximum timeout for Ansible process executed from engin...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Host-Deploy
Version: 4.2.0
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ovirt-4.2.1
: ---
Assignee: Ondra Machacek
QA Contact: Pavol Brilla
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-12-25 11:46 UTC by Nelly Credi
Modified: 2018-02-12 11:50 UTC (History)
8 users (show)

Fixed In Version: ovirt-engine-4.2.1.1
Doc Type: Enhancement
Doc Text:
The default timeout for Ansible process executed from engine has been enlarged to 30 minutes, because especially upgrading hosts can take significant amount of time. If Ansible process doesn't finish until this timeout, engine will kill the Ansible process and fail the action. If even default 30 minutes timeout is not enough, administrators can further enlarge it by creating a new configuration file in /etc/ovirt-engine/engine.conf.d (for example 99-ansible-playbook-timeout.conf) with following content: ANSIBLE_PLAYBOOK_EXEC_DEFAULT_TIMEOUT=NNN where NNN is number minutes which engine should wait for Ansible process to finish.
Clone Of:
Environment:
Last Closed: 2018-02-12 11:50:59 UTC
oVirt Team: Infra
rule-engine: ovirt-4.2+


Attachments (Terms of Use)
playbook log (358 bytes, text/plain)
2017-12-25 11:46 UTC, Nelly Credi
no flags Details
engine log (1.04 MB, text/plain)
2017-12-25 11:47 UTC, Nelly Credi
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 86012 0 'None' MERGED core: Add configuration option for ansible timeout 2021-02-02 15:47:21 UTC

Description Nelly Credi 2017-12-25 11:46:44 UTC
Created attachment 1372137 [details]
playbook log

Description of problem:
when upgrading the hosts from 4.1 to 4.2, the host gets into 'Install Failed' status

Version-Release number of selected component (if applicable):
on the host:
vdsm-4.20.9.3-1.el7ev.x86_64
ovirt-host-4.2.0-1.el7ev.x86_64

on the engine:
ovirt-engine-4.2.0.2-0.1.el7.noarch
ovirt-host-deploy-1.7.0-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. upgrade engine from 4.1 to 4.2
2. set repos on the hosts
3. upgrade via rest or webui

Actual results:
host in 'Installed failed'

Expected results:
host should get updated

Additional info:
looks like ansible is in defunct 
[root@jenkins-vm-16 host-deploy]# ps -ef | grep ansible
ovirt    26764 25035  0 12:13 ?        00:00:07 /usr/bin/python2 /usr/bin/ansible-playbook -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory1951036924750427419 /usr/share/ovirt-engine/playbooks/ovirt-host-upgrade.yml
ovirt    26775 26764  0 12:13 ?        00:00:00 [ansible-playboo] <defunct>
ovirt    29615 25035  1 13:33 ?        00:00:08 /usr/bin/python2 /usr/bin/ansible-playbook -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory7442594928060297520 /usr/share/ovirt-engine/playbooks/ovirt-host-upgrade.yml

Comment 1 Nelly Credi 2017-12-25 11:47:25 UTC
Created attachment 1372138 [details]
engine log

Comment 2 Nelly Credi 2017-12-25 11:55:03 UTC
it is always failing on the first upgrade try, 
but when re-triggering the process it succeeds

Comment 3 Yaniv Kaul 2017-12-26 07:41:51 UTC
What was the source 4.1? Did it have ovirt-host installed?

Comment 4 Nelly Credi 2017-12-26 11:39:49 UTC
build 4.1.8-5

before upgrade:
ovirt-engine-4.1.8.2-0.1.el7.noarch
ovirt-host-deploy-1.6.7-1.el7ev.noarch

ovirt-host is installed only in 4.2 afaik

engine & host after upgrade:
ovirt-host-4.2.0-1.el7ev.x86_64
ovirt-host-dependencies-4.2.0-1.el7ev.x86_64
ovirt-host-deploy-1.7.0-1.el7ev.noarch

Comment 8 Martin Perina 2017-12-26 13:24:43 UTC
Could you please provide also host-deploy log? You have provided only ansible part of host-deploy log?

Comment 9 Nelly Credi 2017-12-29 10:04:01 UTC
Im afraid i dont have them
and i was unable to reproduce since

Comment 11 Martin Perina 2018-01-08 12:55:54 UTC
We have enlarged the default timeout to 30 minutes and also users are now able to change that timeout even further.

Comment 12 Pavol Brilla 2018-01-17 08:38:05 UTC
# grep ANSIBLE /usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf
# yum list ovirt-engine
ovirt-engine.noarch    4.2.0.2-0.1.el7           @rhv-4.2.x

-----

# yum list ovirt-engine
ovirt-engine.noarch            4.2.1.1-0.1.el7          @rhv-4.2x
# grep ANSIBLE /usr/share/ovirt-engine/services/ovirt-engine/ovirt-engine.conf
ANSIBLE_PLAYBOOK_EXEC_DEFAULT_TIMEOUT=30

Comment 13 Sandro Bonazzola 2018-02-12 11:50:59 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.