Bug 1953029
Summary: | HE deployment fails on "Add lines to answerfile" | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | amashah |
Component: | ovirt-ansible-collection | Assignee: | Yedidyah Bar David <didi> |
Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> |
Severity: | medium | Docs Contact: | |
Priority: | high | ||
Version: | 4.4.5 | CC: | didi, lsurette, mavital, nsurati, sbonazzo |
Target Milestone: | ovirt-4.4.7 | Keywords: | Triaged, ZStream |
Target Release: | 4.4.7 | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | No Doc Update | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2021-07-22 15:26:02 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
amashah
2021-04-23 18:38:06 UTC
(In reply to amashah from comment #0) > [ ERROR ] {'msg': 'Destination /root/ovirt-engine-answers does not exist !', Can you please check why /root/ovirt-engine-answers is missing? It's part of the appliance image. Most likely something/someone removed it, or there some corruption/failure/etc. I'd rather not apply the workaround you suggest as a permanent "fix", because this will likely mask some real problem somewhere that should better be addressed directly. Investigation of attached logs suggests that the failure was a result of: 1. On the host deploying the hosted-engine, having a line in /etc/hosts pointing the engine's FQDN to a wrong IP address. 2. Having A VM listening on ssh on that IP address and with the same root password. 3. The code adding a local entry as a first line in /etc/hosts not being effective. Perhaps due to an update of ansible or some other infra/library package, caching of some kind, etc. The result seems to have been that ansible connected to that address and successfully completed some tasks there - all those before 'Add lines to answerfile' in [1] - and then failing in 'Add lines to answerfile' because the file did not exist. How to continue? To prevent/workaround: 1. Do not have a wrong line in /etc/hosts. Generally speaking, oVirt/RHV, like most networked software, is very sensitive to correct name resolution. Double check that forward and back resolution work as expected before trying to deploy. 2. Have different root passwords on different machines. This would have likely caused ansible to fail earlier, thus perhaps making it slightly easier to spot the bug. 3. Use separate networks as applicable, to prevent wrong access. To fix, I'll try: 1. Patch add_engine_as_ansible_host.yml and add there 'ansible_host: "{{ local_vm_ip.stdout_lines[0] }}"' 2. Perhaps patch [1] to check that the machine has a local address as above [1] https://github.com/oVirt/ovirt-ansible-collection/blob/master/roles/hosted_engine_setup/tasks/bootstrap_local_vm/03_engine_initial_tasks.yml I suppose that verification should be general backup and restore on latest 4.4.7.3-0.3.el8ev? (In reply to Nikolai Sednev from comment #12) > I suppose that verification should be general backup and restore on latest > 4.4.7.3-0.3.el8ev? Generally speaking, HE deploy, new setup or restore, should be enough for sanity testing. See also comment 9. For the record: The linked patch only handles a theoretical flow I guessed that happened based on the provided logs, even though I failed to reproduce. Might also be related to customization of name resolution - /etc/resolv.conf, /etc/nsswitch.conf, use of nscd, libnss_db, etc. Backup and restore from ovirt-engine-setup-base-4.4.7.3-0.3.el8ev.noarch to ovirt-engine-setup-4.4.7.4-0.9.el8ev.noarch, from NFS to NFS: I ran "hosted-engine --deploy --restore-from-file=/root/nsednev_from_alma03_rhevm_4_4_7" Pause the execution after adding this host to the engine? You will be able to connect to the restored engine in order to manually review and remediate its configuration. This is normally not required when restoring an up to date and coherent backup. Pause after adding the host? (Yes, No)[No]: yes Got to the part: [ INFO ] You can now connect to https://alma03.qa.lab.tlv.redhat.com:6900/ovirt-engine/ and check the status of this host and eventually remediate it, please continue only when the host is listed as 'up' [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : include_tasks] [ INFO ] ok: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Create temporary lock file] [ INFO ] changed: [localhost] [ INFO ] TASK [ovirt.ovirt.hosted_engine_setup : Pause execution until /tmp/ansible.2yhnku1o_he_setup_lock is removed, delete it once ready to proceed] Then upgraded the engine to latest bits: ovirt-engine-setup-4.4.7.4-0.9.el8ev.noarch ansible-2.9.21-1.el8ae.noarch ovirt-ansible-collection-1.5.1-1.el8ev.noarch python3-ansible-runner-1.4.6-2.el8ar.noarch ansible-runner-service-1.0.7-1.el8ev.noarch Linux nsednev-he-1.qa.lab.tlv.redhat.com 4.18.0-305.7.1.el8_4.x86_64 #1 SMP Mon Jun 14 17:25:42 EDT 2021 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.4 (Ootpa) Then I ran: "rm -rf /tmp/ansible.2yhnku1o_he_setup_lock" and continued with the restore to different NFS share. *New ovirt-ansible-collection-1.5.1-1.el8ev.noarch did not caused me any issues during the restore, although I performed it from engine running ovirt-ansible-collection-1.5.0-1.el8ev.noarch and then restored to ovirt-ansible-collection-1.5.1-1.el8ev.noarch. [ INFO ] Hosted Engine successfully deployed [ INFO ] Other hosted-engine hosts have to be reinstalled in order to update their storage configuration. From the engine, host by host, please set maintenance mode and then click on reinstall button ensuring you choose DEPLOY in hosted engine tab. [ INFO ] Please note that the engine VM ssh keys have changed. Please remove the engine VM entry in ssh known_hosts on your clients. Moving to verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: RHV Engine and Host Common Packages security update [ovirt-4.4.7]), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:2866 Due to QE capacity, we are not going to cover this issue in our automation |