Bug 1697301
Summary: | cluster upgrade fails on timeout after 30 minutes | ||||||
---|---|---|---|---|---|---|---|
Product: | [oVirt] ovirt-engine | Reporter: | Petr Kubica <pkubica> | ||||
Component: | Frontend.Core | Assignee: | Sharon Gratch <sgratch> | ||||
Status: | CLOSED CURRENTRELEASE | QA Contact: | Petr Kubica <pkubica> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 4.3.2.1 | CC: | bugs, lleistne, michal.skrivanek, mperina, omachace, sgratch | ||||
Target Milestone: | ovirt-4.3.4 | Flags: | pm-rhel:
ovirt-4.3+
mperina: blocker? lleistne: testing_ack+ |
||||
Target Release: | 4.3.4.1 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | ovirt-engine-4.3.4.1, ovirt-engine-ui-extensions-1.0.5-1 | Doc Type: | If docs needed, set a value | ||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-06-11 06:24:11 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | UX | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Petr Kubica
2019-04-08 10:30:15 UTC
(In reply to Petr Kubica from comment #0) > Created attachment 1553530 [details] > cluster-upgrade log > > Description of problem: > Based on behavior it seems that playbook with cluster-upgrade role is killed > after 30 minutes (see engine.log and attached log from cluster-upgrade role) > I started cluster upgrade from UI with default value 60 minutes (this > variable should be per host) > > Role upgraded 3 hosts then the role was killed. > > engine.log: > 2019-04-08 11:15:35,782+02 INFO > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > task-4) [] Executing Ansible command: /usr/bin/ansible-playbook > --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v > --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa > --extra-vars=engine_insecure="true" > --extra-vars=engine_url="https://brq-setup.rhev.lab.eng.brq.redhat.com:443/ > ovirt-engine/api" > --extra- > vars=engine_token="BdZKh1WGnyxLLkFFvY1REim3VjzMhRf4eYZfrbJWNBMrMvZSLMAyZsHX_U > D-jF0w2gt7SBVQkfiGMhNI1Ms7ww" > --extra-vars=@/tmp/ansible-variables2580215303052505607 > /usr/share/ovirt-engine/playbooks/ovirt-cluster-upgrade.yml [Logfile: > /var/log/ovirt-engine/ansible/ansible-20190408111535-ovirt-cluster- > upgrade_yml.log] > ... > 2019-04-08 11:45:35,796+02 ERROR > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > task-4) [] Ansible playbook execution failed: Timeout occurred while > executing Ansible playbook. > 2019-04-08 11:45:35,797+02 ERROR > [org.ovirt.engine.core.services.AnsibleServlet] (default task-4) [] Error > while executing ansible-playbook command. Ondro, where do have this timeout? I think we have only 60 minutes timeout to upgrade a host, right? Or is there some other timeout? (In reply to Martin Perina from comment #1) > (In reply to Petr Kubica from comment #0) > > Created attachment 1553530 [details] > > cluster-upgrade log > > > > Description of problem: > > Based on behavior it seems that playbook with cluster-upgrade role is killed > > after 30 minutes (see engine.log and attached log from cluster-upgrade role) > > I started cluster upgrade from UI with default value 60 minutes (this > > variable should be per host) > > > > Role upgraded 3 hosts then the role was killed. > > > > engine.log: > > 2019-04-08 11:15:35,782+02 INFO > > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > > task-4) [] Executing Ansible command: /usr/bin/ansible-playbook > > --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v > > --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa > > --extra-vars=engine_insecure="true" > > --extra-vars=engine_url="https://brq-setup.rhev.lab.eng.brq.redhat.com:443/ > > ovirt-engine/api" > > --extra- > > vars=engine_token="BdZKh1WGnyxLLkFFvY1REim3VjzMhRf4eYZfrbJWNBMrMvZSLMAyZsHX_U > > D-jF0w2gt7SBVQkfiGMhNI1Ms7ww" > > --extra-vars=@/tmp/ansible-variables2580215303052505607 > > /usr/share/ovirt-engine/playbooks/ovirt-cluster-upgrade.yml [Logfile: > > /var/log/ovirt-engine/ansible/ansible-20190408111535-ovirt-cluster- > > upgrade_yml.log] > > ... > > 2019-04-08 11:45:35,796+02 ERROR > > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > > task-4) [] Ansible playbook execution failed: Timeout occurred while > > executing Ansible playbook. > > 2019-04-08 11:45:35,797+02 ERROR > > [org.ovirt.engine.core.services.AnsibleServlet] (default task-4) [] Error > > while executing ansible-playbook command. > > Ondro, where do have this timeout? I think we have only 60 minutes timeout > to upgrade a host, right? Or is there some other timeout? Ahh, I found it, we have 30 minute default timeout for playbook execution in engine: https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.conf.in#L644 This is the option which kills the playbook, right? Correct. We may simply override it. If UI will send any meanigful timeout parameter for the specific playbook, we can pass it to executor and override it. I've sent a patch for it. Just a note: new timeout should be based on number of upgraded host multiply timeout per host upgrade (which is already provided from user) Moving back to POST, we still need to modify UI part the fix is not in ovirt-engine, so just moving this back fixed by https://github.com/oVirt/ovirt-engine-ui-extensions/commit/5347430228c898c6683ff3ac83daba08a9b054cb Verified in ovirt-engine-ui-extensions-1.0.5-1.el7ev.noarch ovirt-engine-4.3.4.2-0.1.el7.noarch This bugzilla is included in oVirt 4.3.4 release, published on June 11th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |