Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1590266

Summary: [RFE] Deployment should fail if the engine is not reachable by its FQDN
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Ido Rosenzwig <irosenzw>
Component: GeneralAssignee: Ido Rosenzwig <irosenzw>
Status: CLOSED CURRENTRELEASE QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.2.22CC: bugs, irosenzw, mavital, nsednev, stirabos, trichard, ylavi
Target Milestone: ovirt-4.2.6Keywords: FutureFeature, Triaged
Target Release: ---Flags: rule-engine: ovirt-4.2+
ylavi: planning_ack+
rule-engine: devel_ack+
mavital: testing_ack+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.2.25 Doc Type: Enhancement
Doc Text:
Previously, self-hosted engine deployment failed with an unclear message if the Manager virtual machine could not be reached by its IP address. Now, a clear message is provided when deployment fails for this reason.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-03 15:09:24 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
log from host alma04 after failed deployment
none
sosreport from alma03
none
logs from alma03 none

Description Ido Rosenzwig 2018-06-12 10:32:26 UTC
Description of problem:

When deploying Hosted-engine with DHCP configuration, if the engine VM receives an IP address which doesn't resolve to its FQDN (due to a user wrong configuration) the deployment failed at the end (because the engine is not reachable by the host)
while it should fail much sooner when it can be detected.


How reproducible:
100%

Steps to Reproduce:
1. make a DHCP reservation (with MAC address) to the engine vm
2. deploy hosted-engine with default answers - keep MAC address different from the DHCP reservation
3. 

Actual results:
Deployment fail at the end.

Expected results:
Deployment should fail at the middle.

Additional info:

Comment 1 Simone Tiraboschi 2018-06-12 10:44:54 UTC
(In reply to Ido Rosenzwig from comment #0)
> while it should fail much sooner when it can be detected.

Not sure if we can fail much earlier since when the engine VM reboots from the shared storage with its final DHCP address we are almost at the end, but at least we can fail with a clear error message.
Now it's simply complaining that the engine don't come up while we can check and explain what happened.

Comment 2 Nikolai Sednev 2018-07-08 13:18:50 UTC
Forth to Ido's request and my latest findings, returning the bug back to assigned.

Comment 3 Red Hat Bugzilla Rules Engine 2018-07-08 13:18:55 UTC
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 4 Nikolai Sednev 2018-07-08 13:51:14 UTC
Created attachment 1457275 [details]
log from host alma04 after failed deployment

Comment 5 Nikolai Sednev 2018-07-08 13:51:59 UTC
2018-07-08 15:10:14,080+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fata
l: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is  while the engine's FQDN nsednev-he-1.qa
.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
2018-07-08 15:10:15,282+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 PLAY
 RECAP [localhost] : ok: 81 changed: 31 unreachable: 0 skipped: 5 failed: 2

Comment 6 Sandro Bonazzola 2018-07-16 13:57:23 UTC
Re-targeting to 4.2.6 being next build blockers only and this not being considered a blocker for 4.2.5.

Comment 7 Nikolai Sednev 2018-08-09 11:03:17 UTC
In which component version have been the fix merged?

Comment 8 Nikolai Sednev 2018-08-09 14:14:16 UTC
[ INFO  ] TASK [Check engine VM health]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.380209", "end": "2018-08-09 17:09:14.401218", "rc": 0, "start": "2018-08-09 17:09:14.021009", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug  9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug  9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug  9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug  9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}"]}
[ INFO  ] TASK [Check VM status at virt level]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Fail if engine VM is not running]
[ INFO  ] skipping: [localhost]
[ INFO  ] TASK [Get target engine VM IPv4 address]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Get VDSM's target engine VM stats]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [Convert stats to JSON format]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Get target engine VM IPv4 address from VDSM stats]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fail if Engine IP is different from engine's FQDN resolved IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is  while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO  ] Stage: Clean up
[ INFO  ] Cleaning temporary resources
[ INFO  ] TASK [Gathering Facts]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Fetch logs from the engine VM]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set destination directory path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Create destination directory]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Find the local appliance image]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Set local_vm_disk_path]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Give the vm time to flush dirty buffers]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Copy engine logs]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [localhost]
[ INFO  ] TASK [Remove local vm dir]
[ INFO  ] changed: [localhost]
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180809171046.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180809162238-hz2chf.log
You have new mail in /var/spool/mail/root
[root@alma03 ~]# 

Tested on:
ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.16-1.el7ev.noarch
rhvm-appliance-4.2-20180808.0.el7.noarch
Linux 3.10.0-862.11.4.el7.x86_64 #1 SMP Tue Aug 7 07:30:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)

1-The expected requirements from initial description not met, e.g. "Expected results:
Deployment should fail at the middle."
2-Too many unclear error messages.

Returning this bug back to assigned.

Comment 9 Ido Rosenzwig 2018-08-13 07:23:21 UTC
Hi Nikolai,

1. We agreed that the deployment will fail at the end, in spite what is written in the description.
2. Please attach the logs of this test or a similar one that the engine gets no IP address.

Comment 14 Nikolai Sednev 2018-08-13 12:46:22 UTC
Created attachment 1475574 [details]
sosreport from alma03

Comment 16 Ido Rosenzwig 2018-08-13 13:29:20 UTC
The sosreport contains just the final_clean.log on /var/log/ovirt-hosted-engine-setup which doesn't help to understand the problem.

please provide an attachment that contains all the logs like this:

engine-logs-2018-08-13T13:07:52Z
ovirt-hosted-engine-setup-20180813082621-hs2cji.log
ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp.log
ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817-fjrei6.log
ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log
ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log
ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645-aiqpyy.log
ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log

Comment 17 Nikolai Sednev 2018-08-13 13:52:30 UTC
(In reply to Ido Rosenzwig from comment #16)
> The sosreport contains just the final_clean.log on
> /var/log/ovirt-hosted-engine-setup which doesn't help to understand the
> problem.
> 
> please provide an attachment that contains all the logs like this:
> 
> engine-logs-2018-08-13T13:07:52Z
> ovirt-hosted-engine-setup-20180813082621-hs2cji.log
> ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp.
> log
> ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817-
> fjrei6.log
> ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log
> ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log
> ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645-
> aiqpyy.log
> ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log

Engine logs can't be attached, engine being killed during deployment.

Comment 18 Nikolai Sednev 2018-08-13 13:55:12 UTC
Created attachment 1475584 [details]
logs from alma03

Comment 19 Ido Rosenzwig 2018-08-14 12:15:31 UTC
In the logs I see the error written as expected:

2018-08-13 15:35:27,308+0300 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u"Fail if Engine IP is different from engine's FQDN resolved IP", 'ansible_result': u'type: <type \'dict\'>\nstr: {\'msg\': u"Engine VM IP address is 10.35.92.55 while the engine\'s FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration", \'changed\': False, \'_ansible_no_log\': False}', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml'}

The message:
Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration

moving back to ON_QA

Comment 20 Nikolai Sednev 2018-08-14 14:34:55 UTC
Forth to conversation with Ido, I'm moving the bug to verified.
"[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
"
The IP now appears as expected.