Created attachment 1475889 [details] logs from alma03 Description of problem: [RFE] Errors should be clear and informative when deployment fails after the engine is not reachable by its FQDN. When deploying Hosted-engine with DHCP configuration, if the engine VM receives an IP address which doesn't resolve to its FQDN (due to a user wrong configuration) the deployment fails with unreadable and not informative error as such: "[ INFO ] TASK [Start ovirt-ha-agent service on the host] [ INFO ] changed: [localhost] [ INFO ] TASK [Check engine VM health] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.388280", "end": "2018-08-13 15:34:51.947498", "rc": 0, "start": "2018-08-13 15:34:51.559218", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=3281 (Mon Aug 13 15:34:40 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=3281 (Mon Aug 13 15:34:40 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"5eb81e6e\", \"local_conf_timestamp\": 3281, \"host-ts\": 3281}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=3281 (Mon Aug 13 15:34:40 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=3281 (Mon Aug 13 15:34:40 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"5eb81e6e\", \"local_conf_timestamp\": 3281, \"host-ts\": 3281}, \"global_maintenance\": false}"]} [ INFO ] TASK [Check VM status at virt level] " Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.16-1.el7ev.noarch rhvm-appliance-4.2-20180813.0.el7.noarch Linux 3.10.0-862.11.6.el7.x86_64 #1 SMP Fri Aug 10 16:55:11 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) How reproducible: 100% Steps to Reproduce: 1.Deploy SHE with wrong MAC address, so the engine won't be able to resolve to it's FQDN. Actual results: Unreadable and meaningless error being printed. Expected results: Errors should appear in informative and readable form. Additional info: Logs from the deployment attached.
The true issue here is the reported error by engine health check. Please correct the error, so customer will be able to understand its meaning.
missed 4.4 feature freeze, moving to 4.5
Actually, we already added an informative error message for this issue, from "ovirt-hosted-engine-setup" log file: 2018-08-13 15:35:25,849+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"} "Engine health check" helps us to indicate if the engine is up, in case of failure during HE deployment we check the reason and show an informative error message right after: 2018-08-13 15:23:15,661+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Check engine VM health] 2018-08-13 15:34:52,135+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'_ansible_parsed': True, u'stderr_lines': [], u'cmd': [u'hosted-engine', u'--vm-status', u'--json'], u'end': u'2018-08-13 15:34:51.947498', u'_ansible_no_log': False, u'stdout': u'{"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=3281 (Mon Aug 13 15:34:40 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=3281 (Mon Aug 13 15:34:40 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n", "hostname": "alma03.qa.lab.tlv.redhat.com", "host-id": 1, "engine-status": {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}, "score": 3400, "stopped": false, "maintenance": false, "crc32": "5eb81e6e", "local_conf_timestamp": 3281, "host-ts": 3281}, "global_maintenance": false}', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u'hosted-engine --vm-status --json', u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, u'start': u'2018-08-13 15:34:51.559218', u'attempts': 120, u'stderr': u'', u'rc': 0, u'delta': u'0:00:00.388280', u'stdout_lines': [u'{"1": {"conf_on_shared_storage": true, "live-data": true, "extra": "metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=3281 (Mon Aug 13 15:34:40 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=3281 (Mon Aug 13 15:34:40 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n", "hostname": "alma03.qa.lab.tlv.redhat.com", "host-id": 1, "engine-status": {"reason": "failed liveliness check", "health": "bad", "vm": "up", "detail": "Up"}, "score": 3400, "stopped": false, "maintenance": false, "crc32": "5eb81e6e", "local_conf_timestamp": 3281, "host-ts": 3281}, "global_maintenance": false}']} 2018-08-13 15:34:52,236+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.388280", "end": "2018-08-13 15:34:51.947498", "rc": 0, "start": "2018-08-13 15:34:51.559218", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=3281 (Mon Aug 13 15:34:40 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=3281 (Mon Aug 13 15:34:40 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"5eb81e6e\", \"local_conf_timestamp\": 3281, \"host-ts\": 3281}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=3281 (Mon Aug 13 15:34:40 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=3281 (Mon Aug 13 15:34:40 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"5eb81e6e\", \"local_conf_timestamp\": 3281, \"host-ts\": 3281}, \"global_maintenance\": false}"]} 2018-08-13 15:34:53,942+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Check VM status at virt level] 2018-08-13 15:34:56,146+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 changed: [localhost] 2018-08-13 15:34:58,050+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 TASK [debug] 2018-08-13 15:34:59,753+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 vm_status_virsh: {'stderr_lines': [], u'changed': True, u'end': u'2018-08-13 15:34:55.988876', u'stdout': u' 2 HostedEngine running', u'cmd': u'virsh -r list | grep HostedEngine | grep running', 'failed': False, u'delta': u'0:00:00.126384', u'stderr': u'', u'rc': 0, 'stdout_lines': [u' 2 HostedEngine running'], u'start': u'2018-08-13 15:34:55.862492'} 2018-08-13 15:35:01,456+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Fail if engine VM is not running] 2018-08-13 15:35:03,289+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 skipping: [localhost] 2018-08-13 15:35:05,093+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get target engine VM IPv4 address] 2018-08-13 15:35:07,200+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 changed: [localhost] 2018-08-13 15:35:09,013+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get VDSM's target engine VM stats] 2018-08-13 15:35:11,619+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 changed: [localhost] 2018-08-13 15:35:13,523+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Convert stats to JSON format] 2018-08-13 15:35:15,227+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 ok: [localhost] 2018-08-13 15:35:17,031+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Get target engine VM IPv4 address from VDSM stats] 2018-08-13 15:35:18,735+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 ok: [localhost] 2018-08-13 15:35:20,538+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 TASK [debug] 2018-08-13 15:35:22,241+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 engine_vm_ip_vdsm: 10.35.92.55 2018-08-13 15:35:24,044+0300 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:100 TASK [Fail if Engine IP is different from engine's FQDN resolved IP] 2018-08-13 15:35:25,748+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 {u'msg': u"Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration", u'changed': False, u'_ansible_no_log': False} 2018-08-13 15:35:25,849+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"} 2018-08-13 15:35:27,452+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 PLAY RECAP [localhost] : ok: 82 changed: 31 unreachable: 0 skipped: 5 failed: 2