Description of problem: hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file as host is marked as non-operational due to missing required network. Errors from deployment logs:- ~~~ 2019-03-07 20:33:50,711+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"} 2019-03-07 20:35:00,862+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"} ~~~ Errors from engine logs:- ~~~ 2019-03-07 20:33:42,342+05 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] Host '<hostname>' is set to Non-Operational, it is missing the following networks: 'test' 2019-03-07 20:33:42,397+05 WARN [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] EVENT_ID: VDS_SET_NONOPERATIONAL_NETWORK(519), Host <hostname> does not comply with the cluster Default networks, the following networks are missing on host: 'test' ~~~ Version-Release number of selected component (if applicable): ovirt-hosted-engine-setup-2.2.34-1.el7ev.noarch ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch rhvm-4.2.8.5-0.1.el7ev.noarch How reproducible: Always Steps to Reproduce: 1. Have a backup_file required non-management logical networks. 2. hosted-engine --deploy --restore-from-file=backup/file_name 3. Actual results: Deployment fails. Expected results: It should work even with required non-management logical networks Additional info:
Setting the severity of this bug to high as this will be a showstopper when a user is recovering the hosted engine after a data loss or corruption where the user cannot go back to old setup and mark the network as not required. This will be a production down scenario. A workaround will be to use hook enginevm_after_engine_setup to set the network as not required before adding the host. /usr/share/ovirt-hosted-engine-setup/ansible/hooks/enginevm_after_engine_setup/fix_network.yml - include_tasks: auth_sso.yml - name: Wait for the engine to reach a stable condition wait_for: timeout=300 - name: fix network ovirt_network: auth: "{{ ovirt_auth }}" name: "{{ item }}" data_center: Default clusters: - name: Default required: False with_items: - require_network_1 - require_network_2
The solution posted here is to give time for the user to fix the "non operational" host. However, the manager will be having NAT network during this time and hence the user won't be able to access the RHV-M portal GUI from an outside client system. So the user has to depend on API using curl/sdk/ansible to fix the non-operational host which may not be easy for every user. Is there any other way to get the GUI during this time?
Got this one https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/commit/900b39f1f7fb0a8277ccf5d6c8b37ce77d30b5ab. Clearing the needinfo.
(In reply to nijin ashok from comment #4) > The solution posted here is to give time for the user to fix the "non > operational" host. However, the manager will be having NAT network during > this time and hence the user won't be able to access the RHV-M portal GUI > from an outside client system. So the user has to depend on API using > curl/sdk/ansible to fix the non-operational host which may not be easy for > every user. > > Is there any other way to get the GUI during this time? It should not be a problem, as I did so from different network already and had no issues with network connectivity.
We are temporary exposing the engine UI over the host via an ssh port forwarding exactly for that reason.
Did we pushed patches from https://bugzilla.redhat.com/show_bug.cgi?id=1712667 to 4.4.0?
Yes, we always start from the newest branch and backport to older branches if needed.
sync2jira
Works for me, moving to verified. Tested on: rhvm-4.4.0-0.31.master.el8ev.noarch ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev Red Hat Enterprise Linux release 8.2 (Ootpa) Linux 4.18.0-193.el8.x86_64 #1 SMP Fri Mar 27 14:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Nothing new from my side since comment #14.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:3246