Bug 1765161 - [downstream clone - 4.3.7] upgrade of host fails on timeout after 30 minutes
Summary: [downstream clone - 4.3.7] upgrade of host fails on timeout after 30 minutes
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: unspecified
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.3.7
: 4.3.7
Assignee: Ondra Machacek
QA Contact: Petr Matyáš
URL:
Whiteboard:
Depends On: 1728617
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-24 12:45 UTC by RHV bug bot
Modified: 2019-12-12 10:36 UTC (History)
9 users (show)

Fixed In Version: ovirt-engine-4.3.7.2
Doc Type: If docs needed, set a value
Doc Text:
The default maximum timeout for an Ansible playbook executed from the engine was 30 minutes. As a result, the upgrade process of the host failed due to the short timeout. In this release the timeout was raised to 120 minutes.
Clone Of: 1728617
Environment:
Last Closed: 2019-12-12 10:36:35 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2019:4229 0 None None None 2019-12-12 10:36:54 UTC
oVirt gerrit 102940 0 'None' MERGED Add timeout to host upgrade 2020-05-05 17:47:35 UTC
oVirt gerrit 102972 0 'None' MERGED Add timeout to host upgrade 2020-05-05 17:47:35 UTC
oVirt gerrit 102976 0 'None' MERGED restapi: Add timeout to host upgrade 2020-05-05 17:47:35 UTC
oVirt gerrit 102980 0 'None' MERGED restapi: Update to model 4.4.7 2020-05-05 17:47:35 UTC
oVirt gerrit 102985 0 'None' MERGED restapi: Update to model 4.3.29 2020-05-05 17:47:35 UTC
oVirt gerrit 103295 0 'None' MERGED restapi: Add timeout to host upgrade 2020-05-05 17:47:36 UTC
oVirt gerrit 104214 0 'None' MERGED ansible: Increase default execution of playbooks 2020-05-05 17:47:36 UTC
oVirt gerrit 104225 0 'None' MERGED ansible: Increase default execution of playbooks 2020-05-05 17:47:36 UTC

Description RHV bug bot 2019-10-24 12:45:49 UTC
+++ This bug is an upstream to downstream clone. The original bug is: +++
+++   bug 1728617 +++
======================================================================

Created attachment 1589048 [details]
engine log

Description of problem:
upgrade of host fails on timeout after 30 minutes

Version-Release number of selected component (if applicable):
ovirt-engine-ui-extensions-1.0.6-1.el7ev.noarch
ovirt-engine-4.3.5.3-0.1.el7.noarch

How reproducible:
33% (1 host out of 3 failed)

Steps to Reproduce:
1. deploy 4.2 engine add 3 hosts
2. upgrade the engine to 4.3 
3. upgrade the hosts to 4.3(in our case via restAPI host upgrade)

Actual results:
host failed with ansible timeout error in engine.log: 
2019-07-09 13:51:41,781+03 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (EE-ManagedThreadFactory-commandCoordinator-Thread-3) [hosts_syncAction_7dc517e8-0819-42b8] Ansible playbook execution failed: Timeout occurred while executing Ansible playbook.

Expected results:
to increase the timeout to be able to finish the host upgrade without failure.

Additional info:
we had before the following bug related to cluster update: 
https://bugzilla.redhat.com/show_bug.cgi?id=1697301

the timeout defined in:
https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.conf.in#L649

(Originally by Kobi Hakimi)

Comment 1 RHV bug bot 2019-10-24 12:45:52 UTC
Created attachment 1589050 [details]
ansible host deploy log file

(Originally by Kobi Hakimi)

Comment 4 RHV bug bot 2019-10-24 12:45:57 UTC
just to make my upgrade flow more clear:
 - deployed rhv-4.2.10-1 with rhel-7.6
 - upgraded to rhv-4.3.5-5 and to rhel 7.7

(Originally by Kobi Hakimi)

Comment 10 RHV bug bot 2019-10-24 12:46:07 UTC
Using ovirt-engine-4.3.7.0-0.1.el7.noarch the timeout is still 30 minutes which is not enough and our upgrade failed again.

(Originally by Petr Matyas)

Comment 11 RHV bug bot 2019-11-01 09:32:03 UTC
WARN: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[Found non-acked flags: '{'rhevm-4.3.z': '?'}', ]

For more info please contact: rhv-devops

Comment 13 Petr Matyáš 2019-11-25 14:59:21 UTC
Verified on ovirt-engine-4.3.7.2-0.1.el7.noarch

Comment 14 Nadav Halevy 2019-12-10 12:27:22 UTC
Hi Ondra,

Please can you review the doc text?


The default maximum timeout for an Ansible playbook executed from the engine was 30 minutes.
As a result, the upgrade process of the host failed due to the short timeout.
In this release the timeout was raised to 120 minutes.

Comment 15 Ondra Machacek 2019-12-10 14:22:50 UTC
It's looks good, thank you.

Comment 17 errata-xmlrpc 2019-12-12 10:36:35 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:4229


Note You need to log in before you can comment on or make changes to this bug.