Created attachment 1553530 [details] cluster-upgrade log Description of problem: Based on behavior it seems that playbook with cluster-upgrade role is killed after 30 minutes (see engine.log and attached log from cluster-upgrade role) I started cluster upgrade from UI with default value 60 minutes (this variable should be per host) Role upgraded 3 hosts then the role was killed. engine.log: 2019-04-08 11:15:35,782+02 INFO [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default task-4) [] Executing Ansible command: /usr/bin/ansible-playbook --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --extra-vars=engine_insecure="true" --extra-vars=engine_url="https://brq-setup.rhev.lab.eng.brq.redhat.com:443/ovirt-engine/api" --extra-vars=engine_token="BdZKh1WGnyxLLkFFvY1REim3VjzMhRf4eYZfrbJWNBMrMvZSLMAyZsHX_UD-jF0w2gt7SBVQkfiGMhNI1Ms7ww" --extra-vars=@/tmp/ansible-variables2580215303052505607 /usr/share/ovirt-engine/playbooks/ovirt-cluster-upgrade.yml [Logfile: /var/log/ovirt-engine/ansible/ansible-20190408111535-ovirt-cluster-upgrade_yml.log] ... 2019-04-08 11:45:35,796+02 ERROR [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default task-4) [] Ansible playbook execution failed: Timeout occurred while executing Ansible playbook. 2019-04-08 11:45:35,797+02 ERROR [org.ovirt.engine.core.services.AnsibleServlet] (default task-4) [] Error while executing ansible-playbook command. Version-Release number of selected component (if applicable): ovirt-engine-ui-extensions-1.0.4-1.el7ev.noarch How reproducible: always Steps to Reproduce: 1. run cluster-upgrade from UI with huge environment (multiple physical host, cluster upgrade should take more than 30 minutes) Actual results: fails on timeout Expected results: role has own timeout, if ansible is running, then the ansible shouldn't be killed Additional info:
(In reply to Petr Kubica from comment #0) > Created attachment 1553530 [details] > cluster-upgrade log > > Description of problem: > Based on behavior it seems that playbook with cluster-upgrade role is killed > after 30 minutes (see engine.log and attached log from cluster-upgrade role) > I started cluster upgrade from UI with default value 60 minutes (this > variable should be per host) > > Role upgraded 3 hosts then the role was killed. > > engine.log: > 2019-04-08 11:15:35,782+02 INFO > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > task-4) [] Executing Ansible command: /usr/bin/ansible-playbook > --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v > --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa > --extra-vars=engine_insecure="true" > --extra-vars=engine_url="https://brq-setup.rhev.lab.eng.brq.redhat.com:443/ > ovirt-engine/api" > --extra- > vars=engine_token="BdZKh1WGnyxLLkFFvY1REim3VjzMhRf4eYZfrbJWNBMrMvZSLMAyZsHX_U > D-jF0w2gt7SBVQkfiGMhNI1Ms7ww" > --extra-vars=@/tmp/ansible-variables2580215303052505607 > /usr/share/ovirt-engine/playbooks/ovirt-cluster-upgrade.yml [Logfile: > /var/log/ovirt-engine/ansible/ansible-20190408111535-ovirt-cluster- > upgrade_yml.log] > ... > 2019-04-08 11:45:35,796+02 ERROR > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > task-4) [] Ansible playbook execution failed: Timeout occurred while > executing Ansible playbook. > 2019-04-08 11:45:35,797+02 ERROR > [org.ovirt.engine.core.services.AnsibleServlet] (default task-4) [] Error > while executing ansible-playbook command. Ondro, where do have this timeout? I think we have only 60 minutes timeout to upgrade a host, right? Or is there some other timeout?
(In reply to Martin Perina from comment #1) > (In reply to Petr Kubica from comment #0) > > Created attachment 1553530 [details] > > cluster-upgrade log > > > > Description of problem: > > Based on behavior it seems that playbook with cluster-upgrade role is killed > > after 30 minutes (see engine.log and attached log from cluster-upgrade role) > > I started cluster upgrade from UI with default value 60 minutes (this > > variable should be per host) > > > > Role upgraded 3 hosts then the role was killed. > > > > engine.log: > > 2019-04-08 11:15:35,782+02 INFO > > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > > task-4) [] Executing Ansible command: /usr/bin/ansible-playbook > > --ssh-common-args=-F /var/lib/ovirt-engine/.ssh/config -v > > --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa > > --extra-vars=engine_insecure="true" > > --extra-vars=engine_url="https://brq-setup.rhev.lab.eng.brq.redhat.com:443/ > > ovirt-engine/api" > > --extra- > > vars=engine_token="BdZKh1WGnyxLLkFFvY1REim3VjzMhRf4eYZfrbJWNBMrMvZSLMAyZsHX_U > > D-jF0w2gt7SBVQkfiGMhNI1Ms7ww" > > --extra-vars=@/tmp/ansible-variables2580215303052505607 > > /usr/share/ovirt-engine/playbooks/ovirt-cluster-upgrade.yml [Logfile: > > /var/log/ovirt-engine/ansible/ansible-20190408111535-ovirt-cluster- > > upgrade_yml.log] > > ... > > 2019-04-08 11:45:35,796+02 ERROR > > [org.ovirt.engine.core.common.utils.ansible.AnsibleExecutor] (default > > task-4) [] Ansible playbook execution failed: Timeout occurred while > > executing Ansible playbook. > > 2019-04-08 11:45:35,797+02 ERROR > > [org.ovirt.engine.core.services.AnsibleServlet] (default task-4) [] Error > > while executing ansible-playbook command. > > Ondro, where do have this timeout? I think we have only 60 minutes timeout > to upgrade a host, right? Or is there some other timeout? Ahh, I found it, we have 30 minute default timeout for playbook execution in engine: https://github.com/oVirt/ovirt-engine/blob/master/packaging/services/ovirt-engine/ovirt-engine.conf.in#L644 This is the option which kills the playbook, right?
Correct. We may simply override it. If UI will send any meanigful timeout parameter for the specific playbook, we can pass it to executor and override it. I've sent a patch for it.
Just a note: new timeout should be based on number of upgraded host multiply timeout per host upgrade (which is already provided from user)
Moving back to POST, we still need to modify UI part
the fix is not in ovirt-engine, so just moving this back fixed by https://github.com/oVirt/ovirt-engine-ui-extensions/commit/5347430228c898c6683ff3ac83daba08a9b054cb
Verified in ovirt-engine-ui-extensions-1.0.5-1.el7ev.noarch ovirt-engine-4.3.4.2-0.1.el7.noarch
This bugzilla is included in oVirt 4.3.4 release, published on June 11th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.4 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.