Bug 1618410 - hosted engine deploy (v2.2.25) failed at task "wait for the host to be up"
Summary: hosted engine deploy (v2.2.25) failed at task "wait for the host to be up"
Keywords:
Status: CLOSED DUPLICATE of bug 1608467
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Build
Version: 2.2.24
Hardware: x86_64
OS: Linux
unspecified
urgent
Target Milestone: ---
: ---
Assignee: Simone Tiraboschi
QA Contact: Lukas Svaty
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-08-16 14:57 UTC by Douglas Duckworth
Modified: 2018-08-20 20:31 UTC (History)
2 users (show)

Fixed In Version:
Clone Of:
Environment:
Last Closed: 2018-08-20 15:27:35 UTC
oVirt Team: Integration
Embargoed:


Attachments (Terms of Use)
hosted engine log (291.61 KB, text/plain)
2018-08-16 14:57 UTC, Douglas Duckworth
no flags Details
hosted engine log failure (286.37 KB, text/plain)
2018-08-16 19:42 UTC, Douglas Duckworth
no flags Details
latest hosted engine log failure (168.74 KB, text/plain)
2018-08-16 19:43 UTC, Douglas Duckworth
no flags Details
updated hosted engine log (168.74 KB, text/plain)
2018-08-16 19:44 UTC, Douglas Duckworth
no flags Details
ovirt logs (78.80 KB, application/x-gzip)
2018-08-20 14:38 UTC, Douglas Duckworth
no flags Details
correct ovirt logs (523.02 KB, application/x-gzip)
2018-08-20 15:01 UTC, Douglas Duckworth
no flags Details

Description Douglas Duckworth 2018-08-16 14:57:41 UTC
Created attachment 1476439 [details]
hosted engine log

Description of problem:


Version-Release number of selected component (if applicable):

CentOS 7.5.1804
3.10.0-862.9.1.el7.x86_64
ovirt-hosted-engine-setup-2.2.25-1.el7.noarch


How reproducible:

Always

Steps to Reproduce:
1. hosted-engine --deploy
2. hosted-engine --noansible

Actual results:

I cannot get past this task in "/usr/share/ovirt-hosted-engine-setup/ansible/bootstrap_local_vm.yml"

      - name: Add host
        ovirt_hosts:
          # TODO: add to the first cluster of the datacenter
          # where we set the vlan id
          name: "{{ HOST_NAME }}"
          state: present
          public_key: true
          address: "{{ HOST_ADDRESS }}"
          auth: "{{ ovirt_auth }}"
        async: 1
        poll: 0
      - name: Wait for the host to be up
        ovirt_hosts_facts:
          pattern: name={{ HOST_NAME }}
          auth: "{{ ovirt_auth }}"
        register: host_result_up_check
        until: host_result_up_check is succeeded and host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and (host_result_up_check.ansible_facts.ovirt_hosts[0].status == 'up' or host_result_up_check.ansible_facts.ovirt_hosts[0].status == 'non_operational')
        retries: 120
        delay: 5
      - debug: var=host_result_up_check
      - name: Check host status
        fail:
          msg: >
            The host has been set in non_operational status,
            please check engine logs,
            fix accordingly and re-deploy.
        when: host_result_up_check is succeeded and host_result_up_check.ansible_facts.ovirt_hosts|length >= 1 and host_result_up_check.ansible_facts.ovirt_hosts[0].status == 'non_operational'

The error:

[ INFO  ] TASK [Wait for the host to be up]
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts": {"ovirt_hosts": [{"address": "ovirt-hv1.pbtech", "affinity_labels": [], "auto_numa_status": "unknown", "certificate": {"organization": "pbtech", "subject": "O=pbtech,CN=ovirt-hv1.
pbtech"}, "cluster": {"href": "/ovirt-engine/api/clusters/a4b6cd02-a0ef-11e8-a347-00163e54fb7f", "id": "a4b6cd02-a0ef-11e8-a347-00163e54fb7f"}, "comment": "", "cpu": {"speed": 0.0, "topology": {}}, "device_passthrough": {"enabled": false}
, "devices": [], "external_network_provider_configurations": [], "external_status": "ok", "hardware_information": {"supported_rng_sources": []}, "hooks": [], "href": "/ovirt-engine/api/hosts/609e7eba-8b85-4830-9a5f-99e561bb503a", "id": "6
09e7eba-8b85-4830-9a5f-99e561bb503a", "katello_errata": [], "kdump_status": "unknown", "ksm": {"enabled": false}, "max_scheduling_memory": 0, "memory": 0, "name": "ovirt-hv1.pbtech", "network_attachments": [], "nics": [], "numa_nodes": []
, "numa_supported": false, "os": {"custom_kernel_cmdline": ""}, "permissions": [], "port": 54321, "power_management": {"automatic_pm_enabled": true, "enabled": false, "kdump_detection": true, "pm_proxies": []}, "protocol": "stomp", "se_li
nux": {}, "spm": {"priority": 5, "status": "none"}, "ssh": {"fingerprint": "SHA256:X+3GNzNZ09Ct7xt6T3sEgVGecyG3QjG71h+D6RnYZU8", "port": 22}, "statistics": [], "status": "install_failed", "storage_connection_extensions": [], "summary": {"
total": 0}, "tags": [], "transparent_huge_pages": {"enabled": false}, "type": "rhel", "unmanaged_networks": [], "update_available": false}]}, "attempts": 120, "changed": false} 
[ INFO  ] TASK [Fetch logs from the engine VM]                

Though the VM's up:

[root@ovirt-hv1 tmp]# ping ovirt-engine.pbtech 
PING ovirt-engine.pbtech (192.168.122.69) 56(84) bytes of data.
64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=1 ttl=64 time=0.186 ms
64 bytes from ovirt-engine.pbtech (192.168.122.69): icmp_seq=2 ttl=64 time=0.153 ms

root@ovirt-hv1 tmp]# wget --no-check-certificate https://ovirt-engine.pbtech/ovirt-engine/api     
--2018-08-16 07:44:36--  https://ovirt-engine.pbtech/ovirt-engine/api 
Resolving ovirt-engine.pbtech (ovirt-engine.pbtech)... 192.168.122.69         
Connecting to ovirt-engine.pbtech (ovirt-engine.pbtech)|192.168.122.69|:443... connected.           
WARNING: cannot verify ovirt-engine.pbtech's certificate, issued by ‘/C=US/O=pbtech/CN=ovirt-engine.pbtech.84693’:   
  Self-signed certificate encountered.     
HTTP request sent, awaiting response... 401 Unauthorized

I running oVirt 4.2.3-1 having reinstalled several times.  Skipping the above Ansible task isn't a viable workaround.

Here are networks on the host.  Note, em1 has ovirtmgmt bridge whereas ib0 provides NFS storage domain.

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
2: em1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master ovirtmgmt state UP group default qlen 1000
    link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
3: em2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 50:9a:4c:89:c6:be brd ff:ff:ff:ff:ff:ff
4: p1p1: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b4:96:91:13:ee:68 brd ff:ff:ff:ff:ff:ff
5: p1p2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether b4:96:91:13:ee:6a brd ff:ff:ff:ff:ff:ff
6: idrac: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UNKNOWN group default qlen 1000
    link/ether 50:9a:4c:89:c6:c0 brd ff:ff:ff:ff:ff:ff
    inet 169.254.0.2/16 brd 169.254.255.255 scope global idrac
       valid_lft forever preferred_lft forever
7: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband a0:00:02:08:fe:80:00:00:00:00:00:00:ec:0d:9a:03:00:1d:19:e1 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 172.16.0.204/24 brd 172.16.0.255 scope global ib0
       valid_lft forever preferred_lft forever
8: virbr0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
    inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
       valid_lft forever preferred_lft forever
9: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
    link/ether 52:54:00:78:d1:c5 brd ff:ff:ff:ff:ff:ff
41: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 50:9a:4c:89:c6:bd brd ff:ff:ff:ff:ff:ff
    inet 10.0.0.176/16 brd 10.0.255.255 scope global ovirtmgmt
       valid_lft forever preferred_lft forever
42: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 5e:ac:28:79:c9:0e brd ff:ff:ff:ff:ff:ff
43: ovs-system: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether 62:a8:d5:20:26:88 brd ff:ff:ff:ff:ff:ff
44: br-int: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
    link/ether ea:41:13:ce:b6:4e brd ff:ff:ff:ff:ff:ff
48: vnet0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master virbr0 state UNKNOWN group default qlen 1000
    link/ether fe:16:3e:54:fb:7f brd ff:ff:ff:ff:ff:ff

default via 10.0.0.52 dev ovirtmgmt 
10.0.0.0/16 dev ovirtmgmt proto kernel scope link src 10.0.0.176 
169.254.0.0/16 dev idrac proto kernel scope link src 169.254.0.2 
169.254.0.0/16 dev ib0 scope link metric 1007 
169.254.0.0/16 dev ovirtmgmt scope link metric 1041 
172.16.0.0/24 dev ib0 proto kernel scope link src 172.16.0.204 
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1


Expected results:

Hosted Engine VM works fine

Additional info:

https://lists.ovirt.org/archives/list/users@ovirt.org/thread/BNZFT42T6PKYHNDFLOZYHYWBEF34DV5Y/

Comment 1 Douglas Duckworth 2018-08-16 19:42:26 UTC
Created attachment 1476506 [details]
hosted engine log failure

Comment 2 Douglas Duckworth 2018-08-16 19:43:59 UTC
Created attachment 1476507 [details]
latest hosted engine log failure

Comment 3 Douglas Duckworth 2018-08-16 19:44:50 UTC
Created attachment 1476509 [details]
updated hosted engine log

Comment 4 Simone Tiraboschi 2018-08-20 10:24:59 UTC
2018-08-16 11:48:56,309-04 ERROR [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-1) [593fbf12] EVENT_ID: VDS_INSTALL_FAILED(505), Host ovirt-hv1.pbtech installation failed. Failed to execute Ansible host-deploy role. Please check logs for more details: /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-ansible-20180816114840-ovirt-hv1.pbtech-593fbf12.log.

Can you please double check that?

host-deploy logs are available under /var/log/ovirt-engine/host-deploy/ on the engine VM and copied back on the host on /var/log/ovirt-hosted-engine-setup/engine-{...}/host-deploy

Comment 5 Douglas Duckworth 2018-08-20 14:38:30 UTC
Sorry, they're now attached!

Comment 6 Douglas Duckworth 2018-08-20 14:38:54 UTC
Created attachment 1477241 [details]
ovirt logs

Comment 7 Simone Tiraboschi 2018-08-20 14:47:56 UTC
Hi,
still not enough, sorry.
Please look for host-deploy directory.

Comment 8 Douglas Duckworth 2018-08-20 15:00:41 UTC
Found them!

Comment 9 Douglas Duckworth 2018-08-20 15:00:42 UTC
Found them!

Comment 10 Douglas Duckworth 2018-08-20 15:01:06 UTC
Created attachment 1477254 [details]
correct ovirt logs

Comment 11 Simone Tiraboschi 2018-08-20 15:12:34 UTC
The issue happened here:

2018-08-16 11:48:54,735 p=26046 u=ovirt |  TASK [ovirt-host-deploy-firewalld : Check if firewalld is installed] ***********
2018-08-16 11:48:55,513 p=26046 u=ovirt |  ok: [ovirt-hv1.pbtech] => {
    "changed": false, 
    "rc": 0, 
    "results": [
        "firewalld-0.4.4.4-14.el7.noarch providing firewalld is already installed"
    ]
}
2018-08-16 11:48:55,561 p=26046 u=ovirt |  TASK [ovirt-host-deploy-firewalld : Check if firewalld is runnning] ************
2018-08-16 11:48:56,228 p=26046 u=ovirt |  fatal: [ovirt-hv1.pbtech]: FAILED! => {
    "changed": false
}

MSG:

Unable to enable service firewalld: Failed to execute operation: Cannot send after transport endpoint shutdown

it seams that checking(?)/restarting firewalld on the host killed the ssh connection.
Was firewalld enabled on the host? was it running? does its configuration allow ssh?

Comment 12 Douglas Duckworth 2018-08-20 15:17:10 UTC
Hi Simone

This is CentOS 7 box configured in a way to mask firewalld and enable iptables.  We do not like firewalld so that's our standard way of deploying CentOS.

Does oVirt require the use of firewalld?

Comment 13 Simone Tiraboschi 2018-08-20 15:27:35 UTC
(In reply to Douglas Duckworth from comment #12)
> Does oVirt require the use of firewalld?

Yes, now it is, at least deploying HE from scratch.
The issue is just there.

*** This bug has been marked as a duplicate of bug 1608467 ***

Comment 14 Douglas Duckworth 2018-08-20 20:31:43 UTC
Thanks Simone

oVirt cluster now up!


Note You need to log in before you can comment on or make changes to this bug.