Bug 1538998

Summary: Ansible playbooks of host deployed getting stuck
Product: [oVirt] ovirt-engine Reporter: Lukas Svaty <lsvaty>
Component: Host-DeployAssignee: Ondra Machacek <omachace>
Status: CLOSED CURRENTRELEASE QA Contact: Lukas Svaty <lsvaty>
Severity: high Docs Contact:
Priority: unspecified    
Version: ---CC: bugs, lsvaty, lveyde, mperina, omachace, pbrilla, sradco
Target Milestone: ovirt-4.2.1Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
lsvaty: testing_ack+
Target Release: 4.2.1.4   
Hardware: All   
OS: All   
Whiteboard:
Fixed In Version: ovirt-engine-4.2.1.4 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-12 11:57:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1514927    

Description Lukas Svaty 2018-01-26 11:07:30 UTC
Description of problem:
During host deployment, ansible playbook is getting killed resulting on playbook getting stuck on ssh connection, and host not running any tasks.
This was tested on rhel7.5. I will update later with 7.4 results.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.5 Beta
ovirt-host-deploy-1.7.1-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Add host to engine
2. Check playbooks results getting stuck

Actual results:
Playbook getting stuck on random metrics related task, and timeouts host deploy after 30 minutes

eg.
2018-01-26 11:25:06,578 p=18633 u=ovirt |  TASK [oVirt.ovirt-fluentd/fluentd-setup : Enable fluentd service] **************


Expected results:
Either error message of hanging ssh connection in logs, or successful installation.

Additional info:
The same result for rhel7.4 host
engine ~ # ps aux | grep ansible
ovirt    21987  5.9  1.1 329252 46148 ?        Sl   12:04   0:06 /usr/bin/python2 /usr/bin/ansible-playbook -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory7202342346104270989 --extra-vars=host_deploy_cluster_version=4.2 --extra-vars=host_deploy_cluster_name=Default --extra-vars=host_deploy_gluster_enabled=false --extra-vars=host_deploy_virt_enabled=true --extra-vars=host_deploy_vdsm_port=54321 --extra-vars=host_deploy_override_firewall=true --extra-vars=host_deploy_firewall_type=FIREWALLD --extra-vars=ansible_port=22 --extra-vars=host_deploy_post_tasks=/etc/ovirt-engine/ansible/ovirt-host-deploy-post-tasks.yml --extra-vars=host_deploy_ovn_tunneling_interface=x.x.x.x.x --extra-vars=host_deploy_ovn_central=x.x.x.x /usr/share/ovirt-engine/playbooks/ovirt-host-deploy.yml
ovirt    22011  0.2  0.0 179024  1892 ?        Ss   12:04   0:00 ssh: /var/lib/ovirt-engine/.ansible/cp/0b222cb912 [mux]
ovirt    22427  0.1  0.0      0     0 ?        Z    12:04   0:00 [ansible-playboo] <defunct>
^^^^^

Comment 1 Lukas Svaty 2018-01-26 13:55:20 UTC
Also worth mentioning, that engine already have ovirt-metrics configured, so adding host might want to configure it as well.

Comment 2 Martin Perina 2018-01-26 19:14:44 UTC
Shirly, could you please take a look what could cause metrics role to be stucked?

Lukasi, could you please attach all engine logs to the bug?

Comment 3 Lukas Svaty 2018-01-27 15:48:31 UTC
Better, I can share the full environment, which Ondra was already investigating. Will post credentials in the mail.

Comment 4 Pavol Brilla 2018-01-29 08:15:39 UTC
Same issue:
During host deployment, ansible playbook is getting killed and waiting for 30 minutes timeout, metrics were not involved in this case

Environment provided to developer

Comment 5 Pavol Brilla 2018-01-29 08:18:19 UTC
Just small addition, host was added to virt+gluster cluster

Comment 6 Pavol Brilla 2018-01-29 11:10:24 UTC
2018-01-29 08:38:02,541 p=735 u=ovirt |  TASK [ovirt-host-deploy-firewalld : Include firewalld rules] *******************
2018-01-29 08:38:02,566 p=735 u=ovirt |  skipping: [10.37.137.139] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,580 p=735 u=ovirt |  TASK [ovirt-host-deploy-firewalld : Enable firewalld rules] ********************
2018-01-29 08:38:02,626 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'ctdb'})  => {
    "changed": false, 
    "item": {
        "service": "ctdb"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,634 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'glusterfs'})  => {
    "changed": false, 
    "item": {
        "service": "glusterfs"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,641 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'nfs'})  => {
    "changed": false, 
    "item": {
        "service": "nfs"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,653 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'nrpe'})  => {
    "changed": false, 
    "item": {
        "service": "nrpe"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}



Host deploy failed 30 minutes later

Comment 7 Lukas Svaty 2018-01-30 14:11:31 UTC
verified in ovirt-engine-4.2.1.4-0.1.el7.noarch

Comment 8 Sandro Bonazzola 2018-02-12 11:57:47 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.