Bug 1816002 - No clear message if "Wait for the host to be up" fails
Summary: No clear message if "Wait for the host to be up" fails
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-ansible-collection
Classification: oVirt
Component: hosted-engine-setup
Version: unspecified
Hardware: Unspecified
OS: Unspecified
medium
low
Target Milestone: ovirt-4.4.2
: ---
Assignee: Asaf Rachmani
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-23 06:59 UTC by Yedidyah Bar David
Modified: 2020-09-18 07:12 UTC (History)
4 users (show)

Fixed In Version: ovirt-ansible-hosted-engine-setup-1.1.7
Clone Of:
Environment:
Last Closed: 2020-09-18 07:12:35 UTC
oVirt Team: Integration
Embargoed:
sbonazzo: ovirt-4.4?
mtessun: planning_ack+
sbonazzo: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github oVirt ovirt-ansible-hosted-engine-setup pull 332 0 None closed Add a clear error message when host is not up 2021-02-04 12:32:49 UTC

Description Yedidyah Bar David 2020-03-23 06:59:40 UTC
Description of problem:

One of the tasks in hosted-engine deploy is to add the host to the engine, and then wait until it shows as 'Up' in the engine.

If after several attempts it's not up, we fail the deploy.

If the host's state is "non_operational", we try to get error messages from the engine vm and present these.

Otherwise, we do not emit anything concrete.

We do always log the result of the check. Before fixing bug 1787267, this looks like:

2019-12-31 14:37:23,095+0900 DEBUG var changed: host "localhost" var "host_result_up_check" type "<type 'dict'>" value: "{
    "ansible_facts": {
        "ovirt_hosts": []
    }, 
    "attempts": 120, 
    "changed": false, 
    "deprecations": [
        {
            "msg": "The 'ovirt_host_facts' module has been renamed to 'ovirt_host_info', and the renamed one no longer returns ansible_facts", 
            "version": "2.13"
        }
    ], 
    "failed": true
}"
2019-12-31 14:37:23,096+0900 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u'Wait for the host to be up', 'ansible_result': u'type: <type \'dict\'>\nstr: {u\'deprecations\': [{u\'msg\': u"The \'ovirt_host_facts\' module has been renamed to \'ovirt_host_info\', and the renamed one no longer returns ansible_facts", u\'version\': u\'2.13\'}], \'_ansible_no_log\': False, u\'changed\': False, \'attempts\': 120, u\'invocation\': {u\'module_args\': {u\'all_content\': False, u\'patt', 'task_duration': 671, 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml'}

However, the note about ovirt_host_facts is just a deprecation warning, and is not the reason for failure. The reason for failure, from the POV of this part of the code, is simply that the host did not come up. The reason for that can usually be diagnosed by checking logs from inside the engine vm, which the script does try to fetch as the next task, which then looks like:

2019-12-31 14:37:23,729+0900 INFO ansible task start {'status': 'OK', 'ansible_task': u'ovirt.hosted_engine_setup : Fetch logs from the engine VM', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/trigger_role.yml', 'ansible_type': 'task'}

So we should probably emit something like:

ERROR: host is not up, please check logs, perhaps also on the engine machine

Comment 1 Gilboa Davara 2020-06-23 14:34:59 UTC
Adding my 2 cents worth (after spending half an hour debugging a broken installation due to me being careless, see below).

1. If the user's DNS server cannot resolve the host address and the user (read: me) was stupid enough to miss the "Add lines... to /etc/hosts on the engine VM?" (or simply answers no), there's little indication by the error message (see below), why the deployment has failed. (unless you are driven enough to connect the semi-dead hosted engine and check what's broken).
2. The "add lines" question should default to "Yes" if the ansible script fails to resolve the host address. (You only get a short warning that the host can only be resolved locally)
3. If the user (read: me) was stupid enough to select "No" in the previous question and the VM engine fails to resolve the host address, it should show a big red sign saying "You have a broken DNS setup. Are you really really, really sure you want to continue trying to deploy? 'cause if it breaks, and it will, you'll get to keep all the pieces...")
4. As above, the resulting error message is very descriptiveness. (to say the least).

Error message:
[ ERROR ] fatal: [localhost]: FAILED! => {"ansible_facts":
{"ovirt_vms": [{"affinity_labels": [], "applications": [], "bios":
{"boot_menu": {"enabled": false}, "type": "cluster_default"},
"cdroms": [], "cluster": {"href":
"/ovirt-engine/api/clusters/1ac7525a-b3d1-11ea-9c7a-00163e57d088",
"id": "1ac7525a-b3d1-11ea-9c7a-00163e57d088"}, "comment": "", "cpu":
{"architecture": "x86_64", "topology": {"cores": 1, "sockets": 4,
"threads": 1}}, "cpu_profile": {"href":
"/ovirt-engine/api/cpuprofiles/58ca604e-01a7-003f-01de-000000000250",
"id": "58ca604e-01a7-003f-01de-000000000250"}, "cpu_shares": 0,
"creation_time": "2020-06-21 11:15:08.207000-04:00",
"delete_protected": false, "description": "", "disk_attachments": [],
"display": {"address": "127.0.0.1", "allow_override": false,
"certificate": {"content": "-----BEGIN
CERTIFICATE-----\nMIID3jCCAsagAwIBAgICEAAwDQYJKoZIhvcNAQELBQAwUTELMAkGA1UEBhMCVVMxFDASBgNVBAoM\nC2xvY2FsZG9tYWluMSwwKgYDVQQDDCNnaWxib2Etd3gtdm1vdmlydC5sb2NhbGRvbWFpbi40MTE5\nMTAeFw0yMDA2MjAxNTA3MTFaFw0zMDA2MTkxNTA3MTFaMFExCzAJBgNVBAYTAlVTMRQwEgYDVQQK\nDAtsb2NhbGRvbWFpbjEsMCoGA1UEAwwjZ2lsYm9hLXd4LXZtb3ZpcnQubG9jYWxkb21haW4uNDEx\nOTEwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQCUNgcCn28BMlMcadFZPR9JAWjOWyh0\nWMQffOSKUlr7H+6K02IdjCR5K9bR9moAlMA4dNzF/NJa12BlCmDkwOSsgZl+NK/Ut3kqfPp4CqMl\nU3jkJzqRnh0rqOFnQ4Q1tsejziH1MSiH5/eb4A3g2s0awXF6K+JRMp2MB9wYQx//tZrvhTLprK+Y\n9jXdQFZby8j+/9pqIdN7uoYbuqESRNcfIJ0WigJ10/IOAwloT0MASwyVtCRTCCXNE4PRN+Lexlcc\nxXq2QZ0zG8u3leLT6/J87PCP/OEj976fZ19q83stWjygu4+UiWS+QStlrzc1U+aGVxa+sO+9mv3f\n6CwT0clvAgMBAAGjgb8wgbwwHQYDVR0OBBYEFOiEmL8+rz3I4j5rmL+ws47Jv5KiMHoGA1UdIwRz\nMHGAFOiEmL8+rz3I4j5rmL+ws47Jv5KioVWkUzBRMQswCQYDVQQGEwJVUzEUMBIGA1UECgwLbG9j\nYWxkb21haW4xLDAqBgNVBAMMI2dpbGJvYS13eC12bW92aXJ0LmxvY2FsZG9tYWluLjQxMTkxggIQ\nADAPBgNVHRMBAf8EBTADAQH/MA4GA1UdDwEB/wQEAwIBBjANBgkqhkiG9w0BAQsFAAOCAQEAStVI\nhHRrw5aa3YUNcwYh+kQfS47Es12nNRFeVVzbXj9CLS/TloYjyXEyZvFmYyyjNvuj4/3WcQDfeaG6\nTUGoFJ1sleOMT04WYWNJGyvsOfokT+I7yrBsVMg/7vip8UQV0ttmVoY/kMhZufwAUNlsZyh6F2o2\nNpAAcdLoguHo3UCGyaL8pF4G0NOAR/eV1rpl4VikqehUsXZ1sYzYZfK98xXrmepI42Lt3B2L6f9t\ngzYJ99jsrOGFhgvgV0H+PclviIdz79Jj3ZpPhezHkNQyrp0GOM0rqW+9xy50tlCQJ4rjdrRxnr21\nGpD3ZaQ2KSwGU79pnnRT6m7MSQ8irci3/A==\n-----END
CERTIFICATE-----\n", "organization": "localdomain", "subject":
"O=localdomain,CN=gilboa-wx-ovirt.localdomain"}, "copy_paste_enabled":
true, "disconnect_action": "LOCK_SCREEN", "file_transfer_enabled":
true, "monitors": 1, "port": 5900, "single_qxl_pci": false,
"smartcard_enabled": false, "type": "vnc"}, "fqdn":
"gilboa-wx-vmovirt.localdomain", "graphics_consoles": [],
"guest_operating_system": {"architecture": "x86_64", "codename": "",
"distribution": "CentOS Linux", "family": "Linux", "kernel":
{"version": {"build": 0, "full_version":
"4.18.0-147.8.1.el8_1.x86_64", "major": 4, "minor": 18, "revision":
147}}, "version": {"full_version": "8", "major": 8}},
"guest_time_zone": {"name": "EDT", "utc_offset": "-04:00"},
"high_availability": {"enabled": false, "priority": 0}, "host":
{"href": "/ovirt-engine/api/hosts/5ca55132-6d20-4a7f-81a8-717095ba8f78",
"id": "5ca55132-6d20-4a7f-81a8-717095ba8f78"}, "host_devices": [],
"href": "/ovirt-engine/api/vms/60ba9f1a-cdb1-406e-810d-187dbdd7775c",
"id": "60ba9f1a-cdb1-406e-810d-187dbdd7775c", "io": {"threads": 1},
"katello_errata": [], "large_icon": {"href":
"/ovirt-engine/api/icons/a753f77a-89a4-4b57-9c23-d23bd61ebdaf", "id":
"a753f77a-89a4-4b57-9c23-d23bd61ebdaf"}, "memory": 8589934592,
"memory_policy": {"guaranteed": 8589934592, "max": 8589934592},
"migration": {"auto_converge": "inherit", "compressed": "inherit",
"encrypted": "inherit"}, "migration_downtime": -1,
"multi_queues_enabled": true, "name": "external-HostedEngineLocal",
"next_run_configuration_exists": false, "nics": [], "numa_nodes": [],
"numa_tune_mode": "interleave", "origin": "external",
"original_template": {"href":
"/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000",
"id": "00000000-0000-0000-0000-000000000000"}, "os": {"boot":
{"devices": ["hd"]}, "type": "other"}, "permissions": [],
"placement_policy": {"affinity": "migratable"}, "quota": {"id":
"27d40902-b3d1-11ea-80f7-00163e57d088"}, "reported_devices": [],
"run_once": false, "sessions": [], "small_icon": {"href":
"/ovirt-engine/api/icons/0676b521-5b2b-4474-9394-8e9e8e3b426f", "id":
"0676b521-5b2b-4474-9394-8e9e8e3b426f"}, "snapshots": [], "sso":
{"methods": [{"id": "guest_agent"}]}, "start_paused": false,
"stateless": false, "statistics": [], "status": "unknown",
"storage_error_resume_behaviour": "auto_resume", "tags": [],
"template": {"href":
"/ovirt-engine/api/templates/00000000-0000-0000-0000-000000000000",
"id": "00000000-0000-0000-0000-000000000000"}, "time_zone": {"name":
"Etc/GMT"}, "type": "server", "usb": {"enabled": false}, "watchdogs":
[]}]}, "attempts": 24, "changed": false, "deprecations": [{"msg": "The
'ovirt_vm_facts' module has been renamed to 'ovirt_vm_info', and the
renamed one no longer returns ansible_facts", "version": "2.13"}]}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing
ansible-playbook

Comment 2 Nikolai Sednev 2020-08-13 06:07:49 UTC
Failed to deploy HE with Host is not up, please check logs, perhaps also on the engine machine
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Host is not up, please check logs, perhaps also on the engine machine"}

ovirt-ansible-hosted-engine-setup-1.1.7-1.el8ev.noarch
rhvm-appliance-4.4-20200722.0.el8ev.x86_64
ovirt-hosted-engine-ha-2.4.4-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.6-1.el8ev.noarch
Linux 4.18.0-193.14.3.el8_2.x86_64 #1 SMP Mon Jul 20 15:02:29 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.2 (Ootpa)

Comment 3 Sandro Bonazzola 2020-09-18 07:12:35 UTC
This bugzilla is included in oVirt 4.4.2 release, published on September 17th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.2 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.