Bug 1765161

Summary: [downstream clone - 4.3.7] upgrade of host fails on timeout after 30 minutes
Product: Red Hat Enterprise Virtualization Manager Reporter: RHV bug bot <rhv-bugzilla-bot>
Component: ovirt-engineAssignee: Ondra Machacek <omachace>
Status: CLOSED ERRATA QA Contact: Petr Matyáš <pmatyas>
Severity: high Docs Contact:
Priority: high    
Version: unspecifiedCC: bugs, dagur, lleistne, lsvaty, mperina, nhalevy, omachace, pelauter, Rhev-m-bugs
Target Milestone: ovirt-4.3.7Keywords: AutomationBlocker, ZStream
Target Release: 4.3.7   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-engine-4.3.7.2 Doc Type: If docs needed, set a value
Doc Text:
The default maximum timeout for an Ansible playbook executed from the engine was 30 minutes. As a result, the upgrade process of the host failed due to the short timeout. In this release the timeout was raised to 120 minutes.
Story Points: ---
Clone Of: 1728617 Environment:
Last Closed: 2019-12-12 10:36:35 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1728617    
Bug Blocks:    

Description RHV bug bot 2019-10-24 12:45:49 UTC
+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1728617 +++
======================================================================

Created attachment 1589048 [details]
engine log

Description of problem:
upgrade of host fails on timeout after 30 minutes

Version-Release number of selected component (if applicable):
ovirt-engine-ui-extensions-1.0.6-1.el7ev.noarch
ovirt-engine-4.3.5.3-0.1.el7.noarch

How reproducible:
33% (1 host out of 3 failed)

Steps to Reproduce:
1. deploy 4.2 engine add 3 hosts
2. upgrade the engine to 4.3 
3. upgrade the hosts to 4.3(in our case via restAPI host upgrade)

Actual results:
host failed with ansible timeout error in engine.log: 
2019-07-09 13:51:41,781+03 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-commandCoordinator-Thread-3) [hosts_syncAction_7dc517e8-0819-42b8] Ansible playbook execution failed: Timeout occurred while executing Ansible playbook.

Expected results:
to increase the timeout to be able to finish the host upgrade without failure.

Additional info:
we had before the following bug related to cluster update: 
https://bugzilla.redhat.com/show_bug.cgi?id=1697301

the timeout defined in:
https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.conf.in#L649

(Originally by Kobi Hakimi)

Comment 1 RHV bug bot 2019-10-24 12:45:52 UTC
Created attachment 1589050 [details]
ansible host deploy log file

(Originally by Kobi Hakimi)

Comment 4 RHV bug bot 2019-10-24 12:45:57 UTC
just to make my upgrade flow more clear:
 - deployed rhv-4.2.10-1 with rhel-7.6
 - upgraded to rhv-4.3.5-5 and to rhel 7.7

(Originally by Kobi Hakimi)

Comment 10 RHV bug bot 2019-10-24 12:46:07 UTC
Using ovirt-engine-4.3.7.0-0.1.el7.noarch the timeout is still 30 minutes which is not enough and our upgrade failed again.

(Originally by Petr Matyas)

Comment 11 RHV bug bot 2019-11-01 09:32:03 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops

Comment 13 Petr Matyáš 2019-11-25 14:59:21 UTC
Verified on ovirt-engine-4.3.7.2-0.1.el7.noarch

Comment 14 Nadav Halevy 2019-12-10 12:27:22 UTC
Hi Ondra,

Please can you review the doc text?


The default maximum timeout for an Ansible playbook executed from the engine was 30 minutes.
As a result, the upgrade process of the host failed due to the short timeout.
In this release the timeout was raised to 120 minutes.

Comment 15 Ondra Machacek 2019-12-10 14:22:50 UTC
It's looks good, thank you.

Comment 17 errata-xmlrpc 2019-12-12 10:36:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4229