Bug 1538998 - Ansible playbooks of host deployed getting stuck
Summary: Ansible playbooks of host deployed getting stuck
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-engine
Classification: oVirt
Component: Host-Deploy
Version: ---
Hardware: All
OS: All
unspecified
high
Target Milestone: ovirt-4.2.1
: 4.2.1.4
Assignee: Ondra Machacek
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks: 1514927
TreeView+ depends on / blocked
 
Reported: 2018-01-26 11:07 UTC by Lukas Svaty
Modified: 2018-02-12 11:57 UTC (History)
7 users (show)

Fixed In Version: ovirt-engine-4.2.1.4
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-12 11:57:47 UTC
oVirt Team: Infra
Embargoed:
rule-engine: ovirt-4.2+
rule-engine: blocker+
lsvaty: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 86873 0 'None' 'MERGED' 'core: Redirect ansible procces output to /dev/null' 2019-12-05 07:38:55 UTC

Description Lukas Svaty 2018-01-26 11:07:30 UTC
Description of problem:
During host deployment, ansible playbook is getting killed resulting on playbook getting stuck on ssh connection, and host not running any tasks.
This was tested on rhel7.5. I will update later with 7.4 results.

Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux Server release 7.5 Beta
ovirt-host-deploy-1.7.1-1.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Add host to engine
2. Check playbooks results getting stuck

Actual results:
Playbook getting stuck on random metrics related task, and timeouts host deploy after 30 minutes

eg.
2018-01-26 11:25:06,578 p=18633 u=ovirt |  TASK [oVirt.ovirt-fluentd/fluentd-setup : Enable fluentd service] **************


Expected results:
Either error message of hanging ssh connection in logs, or successful installation.

Additional info:
The same result for rhel7.4 host
engine ~ # ps aux | grep ansible
ovirt    21987  5.9  1.1 329252 46148 ?        Sl   12:04   0:06 /usr/bin/python2 /usr/bin/ansible-playbook -v --private-key=/etc/pki/ovirt-engine/keys/engine_id_rsa --inventory=/tmp/ansible-inventory7202342346104270989 --extra-vars=host_deploy_cluster_version=4.2 --extra-vars=host_deploy_cluster_name=Default --extra-vars=host_deploy_gluster_enabled=false --extra-vars=host_deploy_virt_enabled=true --extra-vars=host_deploy_vdsm_port=54321 --extra-vars=host_deploy_override_firewall=true --extra-vars=host_deploy_firewall_type=FIREWALLD --extra-vars=ansible_port=22 --extra-vars=host_deploy_post_tasks=/etc/ovirt-engine/ansible/ovirt-host-deploy-post-tasks.yml --extra-vars=host_deploy_ovn_tunneling_interface=x.x.x.x.x --extra-vars=host_deploy_ovn_central=x.x.x.x /usr/share/ovirt-engine/playbooks/ovirt-host-deploy.yml
ovirt    22011  0.2  0.0 179024  1892 ?        Ss   12:04   0:00 ssh: /var/lib/ovirt-engine/.ansible/cp/0b222cb912 [mux]
ovirt    22427  0.1  0.0      0     0 ?        Z    12:04   0:00 [ansible-playboo] <defunct>
^^^^^

Comment 1 Lukas Svaty 2018-01-26 13:55:20 UTC
Also worth mentioning, that engine already have ovirt-metrics configured, so adding host might want to configure it as well.

Comment 2 Martin Perina 2018-01-26 19:14:44 UTC
Shirly, could you please take a look what could cause metrics role to be stucked?

Lukasi, could you please attach all engine logs to the bug?

Comment 3 Lukas Svaty 2018-01-27 15:48:31 UTC
Better, I can share the full environment, which Ondra was already investigating. Will post credentials in the mail.

Comment 4 Pavol Brilla 2018-01-29 08:15:39 UTC
Same issue:
During host deployment, ansible playbook is getting killed and waiting for 30 minutes timeout, metrics were not involved in this case

Environment provided to developer

Comment 5 Pavol Brilla 2018-01-29 08:18:19 UTC
Just small addition, host was added to virt+gluster cluster

Comment 6 Pavol Brilla 2018-01-29 11:10:24 UTC
2018-01-29 08:38:02,541 p=735 u=ovirt |  TASK [ovirt-host-deploy-firewalld : Include firewalld rules] *******************
2018-01-29 08:38:02,566 p=735 u=ovirt |  skipping: [10.37.137.139] => {
    "changed": false, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,580 p=735 u=ovirt |  TASK [ovirt-host-deploy-firewalld : Enable firewalld rules] ********************
2018-01-29 08:38:02,626 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'ctdb'})  => {
    "changed": false, 
    "item": {
        "service": "ctdb"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,634 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'glusterfs'})  => {
    "changed": false, 
    "item": {
        "service": "glusterfs"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,641 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'nfs'})  => {
    "changed": false, 
    "item": {
        "service": "nfs"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}
2018-01-29 08:38:02,653 p=735 u=ovirt |  skipping: [10.37.137.139] => (item={u'service': u'nrpe'})  => {
    "changed": false, 
    "item": {
        "service": "nrpe"
    }, 
    "skip_reason": "Conditional result was False", 
    "skipped": true
}



Host deploy failed 30 minutes later

Comment 7 Lukas Svaty 2018-01-30 14:11:31 UTC
verified in ovirt-engine-4.2.1.4-0.1.el7.noarch

Comment 8 Sandro Bonazzola 2018-02-12 11:57:47 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.