Description of problem: During host deployment, ansible playbook is getting killed resulting on playbook getting stuck on ssh connection, and host not running any tasks. This was tested on rhel7.5. I will update later with 7.4 results. Version-Release number of selected component (if applicable): Red Hat Enterprise Linux Server release 7.5 Beta ovirt-host-deploy-1.7.1-1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. Add host to engine 2. Check playbooks results getting stuck Actual results: Playbook getting stuck on random metrics related task, and timeouts host deploy after 30 minutes eg. 2018-01-26 11:25:06,578 p=18633 u=ovirt | TASK [oVirt.ovirt-fluentd/fluentd-setup : Enable fluentd service] ************** Expected results: Either error message of hanging ssh connection in logs, or successful installation. Additional info: The same result for rhel7.4 host engine ~ # ps aux | grep ansible ovirt 21987 5.9 1.1 329252 46148 ? Sl 12:04 0:06 /usr/bin/python2 /usr/bin/ansible-playbook -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory7202342346104270989 --extra-vars=host_deploy_cluster_version=4.2 --extra-vars=host_deploy_cluster_name=Default --extra-vars=host_deploy_gluster_enabled=false --extra-vars=host_deploy_virt_enabled=true --extra-vars=host_deploy_vdsm_port=54321 --extra-vars=host_deploy_override_firewall=true --extra-vars=host_deploy_firewall_type=FIREWALLD --extra-vars=ansible_port=22 --extra-vars=host_deploy_post_tasks=/etc/ovirt-engine/ansible/ovirt-host-deploy-post-tasks.yml --extra-vars=host_deploy_ovn_tunneling_interface=x.x.x.x.x --extra-vars=host_deploy_ovn_central=x.x.x.x /usr/share/ovirt-engine/playbooks/ovirt-host-deploy.yml ovirt 22011 0.2 0.0 179024 1892 ? Ss 12:04 0:00 ssh: /var/lib/ovirt-engine/.ansible/cp/0b222cb912 [mux] ovirt 22427 0.1 0.0 0 0 ? Z 12:04 0:00 [ansible-playboo] <defunct> ^^^^^
Also worth mentioning, that engine already have ovirt-metrics configured, so adding host might want to configure it as well.
Shirly, could you please take a look what could cause metrics role to be stucked? Lukasi, could you please attach all engine logs to the bug?
Better, I can share the full environment, which Ondra was already investigating. Will post credentials in the mail.
Same issue: During host deployment, ansible playbook is getting killed and waiting for 30 minutes timeout, metrics were not involved in this case Environment provided to developer
Just small addition, host was added to virt+gluster cluster
2018-01-29 08:38:02,541 p=735 u=ovirt | TASK [ovirt-host-deploy-firewalld : Include firewalld rules] ******************* 2018-01-29 08:38:02,566 p=735 u=ovirt | skipping: [10.37.137.139] => { "changed": false, "skip_reason": "Conditional result was False", "skipped": true } 2018-01-29 08:38:02,580 p=735 u=ovirt | TASK [ovirt-host-deploy-firewalld : Enable firewalld rules] ******************** 2018-01-29 08:38:02,626 p=735 u=ovirt | skipping: [10.37.137.139] => (item={u'service': u'ctdb'}) => { "changed": false, "item": { "service": "ctdb" }, "skip_reason": "Conditional result was False", "skipped": true } 2018-01-29 08:38:02,634 p=735 u=ovirt | skipping: [10.37.137.139] => (item={u'service': u'glusterfs'}) => { "changed": false, "item": { "service": "glusterfs" }, "skip_reason": "Conditional result was False", "skipped": true } 2018-01-29 08:38:02,641 p=735 u=ovirt | skipping: [10.37.137.139] => (item={u'service': u'nfs'}) => { "changed": false, "item": { "service": "nfs" }, "skip_reason": "Conditional result was False", "skipped": true } 2018-01-29 08:38:02,653 p=735 u=ovirt | skipping: [10.37.137.139] => (item={u'service': u'nrpe'}) => { "changed": false, "item": { "service": "nrpe" }, "skip_reason": "Conditional result was False", "skipped": true } Host deploy failed 30 minutes later
verified in ovirt-engine-4.2.1.4-0.1.el7.noarch
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.