Red Hat Bugzilla – Bug 1590266
[RFE] Deployment should fail if the engine is not reachable by its FQDN
Last modified: 2018-09-04 02:24:25 EDT
Description of problem: When deploying Hosted-engine with DHCP configuration, if the engine VM receives an IP address which doesn't resolve to its FQDN (due to a user wrong configuration) the deployment failed at the end (because the engine is not reachable by the host) while it should fail much sooner when it can be detected. How reproducible: 100% Steps to Reproduce: 1. make a DHCP reservation (with MAC address) to the engine vm 2. deploy hosted-engine with default answers - keep MAC address different from the DHCP reservation 3. Actual results: Deployment fail at the end. Expected results: Deployment should fail at the middle. Additional info:
(In reply to Ido Rosenzwig from comment #0) > while it should fail much sooner when it can be detected. Not sure if we can fail much earlier since when the engine VM reboots from the shared storage with its final DHCP address we are almost at the end, but at least we can fail with a clear error message. Now it's simply complaining that the engine don't come up while we can check and explain what happened.
Forth to Ido's request and my latest findings, returning the bug back to assigned.
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Created attachment 1457275 [details] log from host alma04 after failed deployment
2018-07-08 15:10:14,080+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fata l: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is while the engine's FQDN nsednev-he-1.qa .lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"} 2018-07-08 15:10:15,282+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 PLAY RECAP [localhost] : ok: 81 changed: 31 unreachable: 0 skipped: 5 failed: 2
Re-targeting to 4.2.6 being next build blockers only and this not being considered a blocker for 4.2.5.
In which component version have been the fix merged?
[ INFO ] TASK [Check engine VM health] [ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.380209", "end": "2018-08-09 17:09:14.401218", "rc": 0, "start": "2018-08-09 17:09:14.021009", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug 9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug 9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug 9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug 9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}"]} [ INFO ] TASK [Check VM status at virt level] [ INFO ] changed: [localhost] [ INFO ] TASK [Fail if engine VM is not running] [ INFO ] skipping: [localhost] [ INFO ] TASK [Get target engine VM IPv4 address] [ INFO ] changed: [localhost] [ INFO ] TASK [Get VDSM's target engine VM stats] [ INFO ] changed: [localhost] [ INFO ] TASK [Convert stats to JSON format] [ INFO ] ok: [localhost] [ INFO ] TASK [Get target engine VM IPv4 address from VDSM stats] [ INFO ] ok: [localhost] [ INFO ] TASK [Fail if Engine IP is different from engine's FQDN resolved IP] [ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook [ INFO ] Stage: Clean up [ INFO ] Cleaning temporary resources [ INFO ] TASK [Gathering Facts] [ INFO ] ok: [localhost] [ INFO ] TASK [Fetch logs from the engine VM] [ INFO ] ok: [localhost] [ INFO ] TASK [Set destination directory path] [ INFO ] ok: [localhost] [ INFO ] TASK [Create destination directory] [ INFO ] changed: [localhost] [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Find the local appliance image] [ INFO ] ok: [localhost] [ INFO ] TASK [Set local_vm_disk_path] [ INFO ] ok: [localhost] [ INFO ] TASK [Give the vm time to flush dirty buffers] [ INFO ] ok: [localhost] [ INFO ] TASK [Copy engine logs] [ INFO ] TASK [include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [Remove local vm dir] [ INFO ] changed: [localhost] [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180809171046.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch. Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180809162238-hz2chf.log You have new mail in /var/spool/mail/root [root@alma03 ~]# Tested on: ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.16-1.el7ev.noarch rhvm-appliance-4.2-20180808.0.el7.noarch Linux 3.10.0-862.11.4.el7.x86_64 #1 SMP Tue Aug 7 07:30:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.5 (Maipo) 1-The expected requirements from initial description not met, e.g. "Expected results: Deployment should fail at the middle." 2-Too many unclear error messages. Returning this bug back to assigned.
Hi Nikolai, 1. We agreed that the deployment will fail at the end, in spite what is written in the description. 2. Please attach the logs of this test or a similar one that the engine gets no IP address.
Created attachment 1475574 [details] sosreport from alma03
The sosreport contains just the final_clean.log on /var/log/ovirt-hosted-engine-setup which doesn't help to understand the problem. please provide an attachment that contains all the logs like this: engine-logs-2018-08-13T13:07:52Z ovirt-hosted-engine-setup-20180813082621-hs2cji.log ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp.log ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817-fjrei6.log ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645-aiqpyy.log ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log
(In reply to Ido Rosenzwig from comment #16) > The sosreport contains just the final_clean.log on > /var/log/ovirt-hosted-engine-setup which doesn't help to understand the > problem. > > please provide an attachment that contains all the logs like this: > > engine-logs-2018-08-13T13:07:52Z > ovirt-hosted-engine-setup-20180813082621-hs2cji.log > ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp. > log > ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817- > fjrei6.log > ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log > ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log > ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645- > aiqpyy.log > ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log Engine logs can't be attached, engine being killed during deployment.
Created attachment 1475584 [details] logs from alma03
In the logs I see the error written as expected: 2018-08-13 15:35:27,308+0300 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u"Fail if Engine IP is different from engine's FQDN resolved IP", 'ansible_result': u'type: <type \'dict\'>\nstr: {\'msg\': u"Engine VM IP address is 10.35.92.55 while the engine\'s FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration", \'changed\': False, \'_ansible_no_log\': False}', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml'} The message: Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration moving back to ON_QA
Forth to conversation with Ido, I'm moving the bug to verified. "[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"} [ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook " The IP now appears as expected.