Bug 1686575
| Summary: | hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file. | |||
|---|---|---|---|---|
| Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Ameya Charekar <achareka> | |
| Component: | ovirt-hosted-engine-setup | Assignee: | Simone Tiraboschi <stirabos> | |
| Status: | CLOSED ERRATA | QA Contact: | Nikolai Sednev <nsednev> | |
| Severity: | high | Docs Contact: | ||
| Priority: | medium | |||
| Version: | 4.2.8-3 | CC: | emarcus, fkust, lleistne, lsurette, mgoldboi, mmartinv, nashok, nsednev, rdlugyhe, sirao, stirabos | |
| Target Milestone: | ovirt-4.4.0 | Keywords: | Triaged, ZStream | |
| Target Release: | 4.4.0 | Flags: | emarcus:
needinfo-
lsvaty: testing_plan_complete- |
|
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | ovirt-hosted-engine-setup-2.3.9-1.el7ev | Doc Type: | Bug Fix | |
| Doc Text: |
Previously, the self-hosted engine high availability host’s management network was configured during deployment. The VDSM took over the Network Manager and configured the selected network interface during initial deployment, while the Network Manager remained disabled. During restore, there was no option to attach additional (non-default) networks, and the restore process failed because the high-availability host had no connectivity to networks previously configured by the user that were listed in the backup file.
In this release, the user can pause the restore process, manually add the required networks, and resume the restore process to completion.
|
Story Points: | --- | |
| Clone Of: | ||||
| : | 1712667 (view as bug list) | Environment: | ||
| Last Closed: | 2020-08-04 13:26:25 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | Integration | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | 1795672, 1811734 | |||
| Bug Blocks: | 1712667 | |||
Setting the severity of this bug to high as this will be a showstopper when a user is recovering the hosted engine after a data loss or corruption where the user cannot go back to old setup and mark the network as not required. This will be a production down scenario.
A workaround will be to use hook enginevm_after_engine_setup to set the network as not required before adding the host.
/usr/share/ovirt-hosted-engine-setup/ansible/hooks/enginevm_after_engine_setup/fix_network.yml
- include_tasks: auth_sso.yml
- name: Wait for the engine to reach a stable condition
wait_for: timeout=300
- name: fix network
ovirt_network:
auth: "{{ ovirt_auth }}"
name: "{{ item }}"
data_center: Default
clusters:
- name: Default
required: False
with_items:
- require_network_1
- require_network_2
The solution posted here is to give time for the user to fix the "non operational" host. However, the manager will be having NAT network during this time and hence the user won't be able to access the RHV-M portal GUI from an outside client system. So the user has to depend on API using curl/sdk/ansible to fix the non-operational host which may not be easy for every user. Is there any other way to get the GUI during this time? Got this one https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/commit/900b39f1f7fb0a8277ccf5d6c8b37ce77d30b5ab. Clearing the needinfo. (In reply to nijin ashok from comment #4) > The solution posted here is to give time for the user to fix the "non > operational" host. However, the manager will be having NAT network during > this time and hence the user won't be able to access the RHV-M portal GUI > from an outside client system. So the user has to depend on API using > curl/sdk/ansible to fix the non-operational host which may not be easy for > every user. > > Is there any other way to get the GUI during this time? It should not be a problem, as I did so from different network already and had no issues with network connectivity. We are temporary exposing the engine UI over the host via an ssh port forwarding exactly for that reason. Did we pushed patches from https://bugzilla.redhat.com/show_bug.cgi?id=1712667 to 4.4.0? Yes, we always start from the newest branch and backport to older branches if needed. sync2jira sync2jira Works for me, moving to verified. Tested on: rhvm-4.4.0-0.31.master.el8ev.noarch ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev Red Hat Enterprise Linux release 8.2 (Ootpa) Linux 4.18.0-193.el8.x86_64 #1 SMP Fri Mar 27 14:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Nothing new from my side since comment #14. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246 |
Description of problem: hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file as host is marked as non-operational due to missing required network. Errors from deployment logs:- ~~~ 2019-03-07 20:33:50,711+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"} 2019-03-07 20:35:00,862+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} ~~~ Errors from engine logs:- ~~~ 2019-03-07 20:33:42,342+05 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] Host '<hostname>' is set to Non-Operational, it is missing the following networks: 'test' 2019-03-07 20:33:42,397+05 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] EVENT_ID: VDS_SET_NONOPERATIONAL_NETWORK(519), Host <hostname> does not comply with the cluster Default networks, the following networks are missing on host: 'test' ~~~ Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.2.34-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch rhvm-4.2.8.5-0.1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Have a backup_file required non-management logical networks. 2. hosted-engine --deploy --restore-from-file=backup/file_name 3. Actual results: Deployment fails. Expected results: It should work even with required non-management logical networks Additional info: