Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1590266 - [RFE] Deployment should fail if the engine is not reachable by its FQDN
[RFE] Deployment should fail if the engine is not reachable by its FQDN
Status: CLOSED CURRENTRELEASE
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: General (Show other bugs)
2.2.22
Unspecified Unspecified
medium Severity medium (vote)
: ovirt-4.2.6
: ---
Assigned To: Ido Rosenzwig
Nikolai Sednev
: FutureFeature, Triaged
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-06-12 06:32 EDT by Ido Rosenzwig
Modified: 2018-09-04 02:24 EDT (History)
7 users (show)

See Also:
Fixed In Version: ovirt-hosted-engine-setup-2.2.25
Doc Type: Enhancement
Doc Text:
Previously, self-hosted engine deployment failed with an unclear message if the Manager virtual machine could not be reached by its IP address. Now, a clear message is provided when deployment fails for this reason.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-09-03 11:09:24 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Integration
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
rule-engine: ovirt‑4.2+
ylavi: planning_ack+
rule-engine: devel_ack+
mavital: testing_ack+


Attachments (Terms of Use)
log from host alma04 after failed deployment (492.84 KB, text/plain)
2018-07-08 09:51 EDT, Nikolai Sednev
no flags Details
sosreport from alma03 (9.61 MB, application/x-xz)
2018-08-13 08:46 EDT, Nikolai Sednev
no flags Details
logs from alma03 (165.85 KB, application/x-gzip)
2018-08-13 09:55 EDT, Nikolai Sednev
no flags Details


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 92172 master MERGED Ansible: Add checks when engine vm is not reachable by the host 2018-09-03 05:32 EDT
oVirt gerrit 92270 ovirt-hosted-engine-setup-2.2 MERGED Ansible: Add checks when engine vm is not reachable by the host 2018-06-14 10:51 EDT
oVirt gerrit 92894 master MERGED Ansible: Fail if the engine VM doesn't have IP address 2018-09-03 05:32 EDT
oVirt gerrit 92895 ovirt-hosted-engine-setup-2.2 MERGED Ansible: Fail if the engine VM doesn't have IP address 2018-07-23 04:19 EDT

  None (edit)
Description Ido Rosenzwig 2018-06-12 06:32:26 EDT
Description of problem:

When deploying Hosted-engine with DHCP configuration, if the engine VM receives an IP address which doesn't resolve to its FQDN (due to a user wrong configuration) the deployment failed at the end (because the engine is not reachable by the host)
while it should fail much sooner when it can be detected.


How reproducible:
100%

Steps to Reproduce:
1. make a DHCP reservation (with MAC address) to the engine vm
2. deploy hosted-engine with default answers - keep MAC address different from the DHCP reservation
3. 

Actual results:
Deployment fail at the end.

Expected results:
Deployment should fail at the middle.

Additional info:
Comment 1 Simone Tiraboschi 2018-06-12 06:44:54 EDT
(In reply to Ido Rosenzwig from comment #0)
> while it should fail much sooner when it can be detected.

Not sure if we can fail much earlier since when the engine VM reboots from the shared storage with its final DHCP address we are almost at the end, but at least we can fail with a clear error message.
Now it's simply complaining that the engine don't come up while we can check and explain what happened.
Comment 2 Nikolai Sednev 2018-07-08 09:18:50 EDT
Forth to Ido's request and my latest findings, returning the bug back to assigned.
Comment 3 Red Hat Bugzilla Rules Engine 2018-07-08 09:18:55 EDT
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
Comment 4 Nikolai Sednev 2018-07-08 09:51 EDT
Created attachment 1457275 [details]
log from host alma04 after failed deployment
Comment 5 Nikolai Sednev 2018-07-08 09:51:59 EDT
2018-07-08 15:10:14,080+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fata
l: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is  while the engine's FQDN nsednev-he-1.qa
.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
2018-07-08 15:10:15,282+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 PLAY
 RECAP [localhost] : ok: 81 changed: 31 unreachable: 0 skipped: 5 failed: 2
Comment 6 Sandro Bonazzola 2018-07-16 09:57:23 EDT
Re-targeting to 4.2.6 being next build blockers only and this not being considered a blocker for 4.2.5.
Comment 7 Nikolai Sednev 2018-08-09 07:03:17 EDT
In which component version have been the fix merged?
Comment 8 Nikolai Sednev 2018-08-09 10:14:16 EDT
[ INFO  ] TASK [Check engine VM health]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.380209", "end": "2018-08-09 17:09:14.401218", "rc": 0, "start": "2018-08-09 17:09:14.021009", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug  9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug  9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug  9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug  9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}"]}
[ INFO  ] TASK [Check VM status at virt level]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Fail if engine VM is not running]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Get target engine VM IPv4 address]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Get VDSM's target engine VM stats]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Convert stats to JSON format]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Get target engine VM IPv4 address from VDSM stats]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fail if Engine IP is different from engine's FQDN resolved IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is  while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180809171046.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180809162238-hz2chf.log
You have new mail in /var/spool/mail/root
[root@alma03 ~]# 

Tested on:
ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.16-1.el7ev.noarch
rhvm-appliance-4.2-20180808.0.el7.noarch
Linux 3.10.0-862.11.4.el7.x86_64 #1 SMP Tue Aug 7 07:30:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

1-The expected requirements from initial description not met, e.g. "Expected results:
Deployment should fail at the middle."
2-Too many unclear error messages.

Returning this bug back to assigned.
Comment 9 Ido Rosenzwig 2018-08-13 03:23:21 EDT
Hi Nikolai,

1. We agreed that the deployment will fail at the end, in spite what is written in the description.
2. Please attach the logs of this test or a similar one that the engine gets no IP address.
Comment 14 Nikolai Sednev 2018-08-13 08:46 EDT
Created attachment 1475574 [details]
sosreport from alma03
Comment 16 Ido Rosenzwig 2018-08-13 09:29:20 EDT
The sosreport contains just the final_clean.log on /var/log/ovirt-hosted-engine-setup which doesn't help to understand the problem.

please provide an attachment that contains all the logs like this:

engine-logs-2018-08-13T13:07:52Z
ovirt-hosted-engine-setup-20180813082621-hs2cji.log
ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp.log
ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817-fjrei6.log
ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log
ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log
ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645-aiqpyy.log
ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log
Comment 17 Nikolai Sednev 2018-08-13 09:52:30 EDT
(In reply to Ido Rosenzwig from comment #16)
> The sosreport contains just the final_clean.log on
> /var/log/ovirt-hosted-engine-setup which doesn't help to understand the
> problem.
> 
> please provide an attachment that contains all the logs like this:
> 
> engine-logs-2018-08-13T13:07:52Z
> ovirt-hosted-engine-setup-20180813082621-hs2cji.log
> ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp.
> log
> ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817-
> fjrei6.log
> ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log
> ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log
> ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645-
> aiqpyy.log
> ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log

Engine logs can't be attached, engine being killed during deployment.
Comment 18 Nikolai Sednev 2018-08-13 09:55 EDT
Created attachment 1475584 [details]
logs from alma03
Comment 19 Ido Rosenzwig 2018-08-14 08:15:31 EDT
In the logs I see the error written as expected:

2018-08-13 15:35:27,308+0300 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u"Fail if Engine IP is different from engine's FQDN resolved IP", 'ansible_result': u'type: <type \'dict\'>\nstr: {\'msg\': u"Engine VM IP address is 10.35.92.55 while the engine\'s FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration", \'changed\': False, \'_ansible_no_log\': False}', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml'}

The message:
Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration

moving back to ON_QA
Comment 20 Nikolai Sednev 2018-08-14 10:34:55 EDT
Forth to conversation with Ido, I'm moving the bug to verified.
"[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
"
The IP now appears as expected.

Note You need to log in before you can comment on or make changes to this bug.