Bug 1590266
| Summary: | [RFE] Deployment should fail if the engine is not reachable by its FQDN | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [oVirt] ovirt-hosted-engine-setup | Reporter: | Ido Rosenzwig <irosenzw> | ||||||||
| Component: | General | Assignee: | Ido Rosenzwig <irosenzw> | ||||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Nikolai Sednev <nsednev> | ||||||||
| Severity: | medium | Docs Contact: | |||||||||
| Priority: | medium | ||||||||||
| Version: | 2.2.22 | CC: | bugs, irosenzw, mavital, nsednev, stirabos, trichard, ylavi | ||||||||
| Target Milestone: | ovirt-4.2.6 | Keywords: | FutureFeature, Triaged | ||||||||
| Target Release: | --- | Flags: | rule-engine:
ovirt-4.2+
ylavi: planning_ack+ rule-engine: devel_ack+ mavital: testing_ack+ |
||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | ovirt-hosted-engine-setup-2.2.25 | Doc Type: | Enhancement | ||||||||
| Doc Text: |
Previously, self-hosted engine deployment failed with an unclear message if the Manager virtual machine could not be reached by its IP address. Now, a clear message is provided when deployment fails for this reason.
|
Story Points: | --- | ||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2018-09-03 15:09:24 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
|
Description
Ido Rosenzwig
2018-06-12 10:32:26 UTC
(In reply to Ido Rosenzwig from comment #0) > while it should fail much sooner when it can be detected. Not sure if we can fail much earlier since when the engine VM reboots from the shared storage with its final DHCP address we are almost at the end, but at least we can fail with a clear error message. Now it's simply complaining that the engine don't come up while we can check and explain what happened. Forth to Ido's request and my latest findings, returning the bug back to assigned. Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release. Created attachment 1457275 [details]
log from host alma04 after failed deployment
2018-07-08 15:10:14,080+0300 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fata
l: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is while the engine's FQDN nsednev-he-1.qa
.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
2018-07-08 15:10:15,282+0300 DEBUG otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:94 PLAY
RECAP [localhost] : ok: 81 changed: 31 unreachable: 0 skipped: 5 failed: 2
Re-targeting to 4.2.6 being next build blockers only and this not being considered a blocker for 4.2.5. In which component version have been the fix merged? [ INFO ] TASK [Check engine VM health]
[ ERROR ] fatal: [localhost]: FAILED! => {"attempts": 120, "changed": true, "cmd": ["hosted-engine", "--vm-status", "--json"], "delta": "0:00:00.380209", "end": "2018-08-09 17:09:14.401218", "rc": 0, "start": "2018-08-09 17:09:14.021009", "stderr": "", "stderr_lines": [], "stdout": "{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug 9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug 9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}", "stdout_lines": ["{\"1\": {\"conf_on_shared_storage\": true, \"live-data\": true, \"extra\": \"metadata_parse_version=1\\nmetadata_feature_version=1\\ntimestamp=2940 (Thu Aug 9 17:09:11 2018)\\nhost-id=1\\nscore=3400\\nvm_conf_refresh_time=2940 (Thu Aug 9 17:09:11 2018)\\nconf_on_shared_storage=True\\nmaintenance=False\\nstate=EngineStarting\\nstopped=False\\n\", \"hostname\": \"alma03.qa.lab.tlv.redhat.com\", \"host-id\": 1, \"engine-status\": {\"reason\": \"failed liveliness check\", \"health\": \"bad\", \"vm\": \"up\", \"detail\": \"Up\"}, \"score\": 3400, \"stopped\": false, \"maintenance\": false, \"crc32\": \"c62bfdad\", \"local_conf_timestamp\": 2940, \"host-ts\": 2940}, \"global_maintenance\": false}"]}
[ INFO ] TASK [Check VM status at virt level]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Fail if engine VM is not running]
[ INFO ] skipping: [localhost]
[ INFO ] TASK [Get target engine VM IPv4 address]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Get VDSM's target engine VM stats]
[ INFO ] changed: [localhost]
[ INFO ] TASK [Convert stats to JSON format]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Get target engine VM IPv4 address from VDSM stats]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Fail if Engine IP is different from engine's FQDN resolved IP]
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ INFO ] Stage: Clean up
[ INFO ] Cleaning temporary resources
[ INFO ] TASK [Gathering Facts]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Fetch logs from the engine VM]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Set destination directory path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Create destination directory]
[ INFO ] changed: [localhost]
[ INFO ] TASK [include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Find the local appliance image]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Set local_vm_disk_path]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Give the vm time to flush dirty buffers]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Copy engine logs]
[ INFO ] TASK [include_tasks]
[ INFO ] ok: [localhost]
[ INFO ] TASK [Remove local vm dir]
[ INFO ] changed: [localhost]
[ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20180809171046.conf'
[ INFO ] Stage: Pre-termination
[ INFO ] Stage: Termination
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20180809162238-hz2chf.log
You have new mail in /var/spool/mail/root
[root@alma03 ~]#
Tested on:
ovirt-hosted-engine-setup-2.2.25-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.16-1.el7ev.noarch
rhvm-appliance-4.2-20180808.0.el7.noarch
Linux 3.10.0-862.11.4.el7.x86_64 #1 SMP Tue Aug 7 07:30:07 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.5 (Maipo)
1-The expected requirements from initial description not met, e.g. "Expected results:
Deployment should fail at the middle."
2-Too many unclear error messages.
Returning this bug back to assigned.
Hi Nikolai, 1. We agreed that the deployment will fail at the end, in spite what is written in the description. 2. Please attach the logs of this test or a similar one that the engine gets no IP address. Created attachment 1475574 [details]
sosreport from alma03
The sosreport contains just the final_clean.log on /var/log/ovirt-hosted-engine-setup which doesn't help to understand the problem. please provide an attachment that contains all the logs like this: engine-logs-2018-08-13T13:07:52Z ovirt-hosted-engine-setup-20180813082621-hs2cji.log ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp.log ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817-fjrei6.log ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645-aiqpyy.log ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log (In reply to Ido Rosenzwig from comment #16) > The sosreport contains just the final_clean.log on > /var/log/ovirt-hosted-engine-setup which doesn't help to understand the > problem. > > please provide an attachment that contains all the logs like this: > > engine-logs-2018-08-13T13:07:52Z > ovirt-hosted-engine-setup-20180813082621-hs2cji.log > ovirt-hosted-engine-setup-ansible-bootstrap_local_vm-20180813082745-0epanp. > log > ovirt-hosted-engine-setup-ansible-create_storage_domain-20180813084817- > fjrei6.log > ovirt-hosted-engine-setup-ansible-create_target_vm-20180813084937-15ay3v.log > ovirt-hosted-engine-setup-ansible-final_clean-20180813090750-2qn1gx.log > ovirt-hosted-engine-setup-ansible-get_network_interfaces-20180813082645- > aiqpyy.log > ovirt-hosted-engine-setup-ansible-initial_clean-20180813082721-ssnqhk.log Engine logs can't be attached, engine being killed during deployment. Created attachment 1475584 [details]
logs from alma03
In the logs I see the error written as expected:
2018-08-13 15:35:27,308+0300 ERROR ansible failed {'status': 'FAILED', 'ansible_type': 'task', 'ansible_task': u"Fail if Engine IP is different from engine's FQDN resolved IP", 'ansible_result': u'type: <type \'dict\'>\nstr: {\'msg\': u"Engine VM IP address is 10.35.92.55 while the engine\'s FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration", \'changed\': False, \'_ansible_no_log\': False}', 'ansible_host': u'localhost', 'ansible_playbook': u'/usr/share/ovirt-hosted-engine-setup/ansible/create_target_vm.yml'}
The message:
Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration
moving back to ON_QA
Forth to conversation with Ido, I'm moving the bug to verified.
"[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Engine VM IP address is 10.35.92.55 while the engine's FQDN nsednev-he-1.qa.lab.tlv.redhat.com resolves to 10.35.92.51. If you are using DHCP, check your DHCP reservation configuration"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
"
The IP now appears as expected.
|