1686575 – hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file.

Bug 1686575 - hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file.

Summary: hosted-engine deploy (restore-from-file) fails if any non-management logical ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	ovirt-hosted-engine-setup
Sub Component:
Version:	4.2.8-3
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	high
Target Milestone:	ovirt-4.4.0
Target Release:	4.4.0
Assignee:	Simone Tiraboschi
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:
Depends On:	1795672 1811734
Blocks:	1712667
TreeView+	depends on / blocked

Reported:	2019-03-07 18:04 UTC by Ameya Charekar
Modified:	2023-09-07 19:48 UTC (History)
CC List:	11 users (show)
Fixed In Version:	ovirt-hosted-engine-setup-2.3.9-1.el7ev
Doc Type:	Bug Fix
Doc Text:	Previously, the self-hosted engine high availability host’s management network was configured during deployment. The VDSM took over the Network Manager and configured the selected network interface during initial deployment, while the Network Manager remained disabled. During restore, there was no option to attach additional (non-default) networks, and the restore process failed because the high-availability host had no connectivity to networks previously configured by the user that were listed in the backup file. In this release, the user can pause the restore process, manually add the required networks, and resume the restore process to completion.
Clone Of:
Clones:	1712667 (view as bug list)
Environment:
Last Closed:	2020-08-04 13:26:25 UTC
oVirt Team:	Integration
Target Upstream Version:
Embargoed:
Flags:	emarcus: needinfo- lsvaty: testing_plan_complete-

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Github	oVirt ovirt-ansible-hosted-engine-setup pull 183	'None'	closed	Let the user pause execution to interactively bring up the host	2021-01-26 12:51:03 UTC
Red Hat Knowledge Base (Solution)	4088711	Troubleshoot	None	[RHV] The hosted-engine deploy (restore-from-file) fails if any non-management logical network is defined as a required ...	2019-05-02 17:36:55 UTC
Red Hat Product Errata	RHEA-2020:3246	None	None	None	2020-08-04 13:26:51 UTC
oVirt gerrit	100269	'None'	MERGED	Let the user pause the execution on restore	2021-01-26 12:51:01 UTC
oVirt gerrit	100275	'None'	MERGED	Let the user pause the execution on restore	2021-01-26 12:51:02 UTC

Description Ameya Charekar 2019-03-07 18:04:44 UTC

Description of problem:
hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file as host is marked as non-operational due to missing required network.

Errors from deployment logs:-
~~~
2019-03-07 20:33:50,711+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}

2019-03-07 20:35:00,862+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
~~~

Errors from engine logs:-
~~~
2019-03-07 20:33:42,342+05 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] Host '<hostname>' is set to Non-Operational, it is missing the following networks: 'test'
2019-03-07 20:33:42,397+05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] EVENT_ID: VDS_SET_NONOPERATIONAL_NETWORK(519), Host <hostname> does not comply with the cluster Default networks, the following networks are missing on host: 'test'
~~~


Version-Release number of selected component (if applicable):

ovirt-hosted-engine-setup-2.2.34-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch
rhvm-4.2.8.5-0.1.el7ev.noarch

How reproducible:
Always


Steps to Reproduce:
1. Have a backup_file required non-management logical networks.
2. hosted-engine --deploy --restore-from-file=backup/file_name
3. 

Actual results:
Deployment fails.

Expected results:
It should work even with required non-management logical networks

Additional info:

Comment 2 nijin ashok 2019-05-02 15:35:45 UTC

Setting the severity of this bug to high as this will be a showstopper when a user is recovering the hosted engine after a data loss or corruption where the user cannot go back to old setup and mark the network as not required. This will be a production down scenario.

A workaround will be to use hook enginevm_after_engine_setup to set the network as not required before adding the host.


/usr/share/ovirt-hosted-engine-setup/ansible/hooks/enginevm_after_engine_setup/fix_network.yml

- include_tasks: auth_sso.yml
- name: Wait for the engine to reach a stable condition
  wait_for: timeout=300
- name: fix network
  ovirt_network:
     auth: "{{ ovirt_auth }}"
     name: "{{ item }}"
     data_center: Default
     clusters:
        - name: Default
          required: False
  with_items:
     - require_network_1
     - require_network_2

Comment 4 nijin ashok 2019-06-06 02:49:02 UTC

The solution posted here is to give time for the user to fix the "non operational" host. However, the manager will be having NAT network during this time and hence the user won't be able to access the RHV-M portal GUI from an outside client system. So the user has to depend on API using curl/sdk/ansible to fix the non-operational host which may not be easy for every user. 

Is there any other way to get the GUI during this time?

Comment 5 nijin ashok 2019-06-06 04:46:35 UTC

Got this one https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/commit/900b39f1f7fb0a8277ccf5d6c8b37ce77d30b5ab. Clearing the needinfo.

Comment 6 Nikolai Sednev 2019-06-06 06:45:02 UTC

(In reply to nijin ashok from comment #4)
> The solution posted here is to give time for the user to fix the "non
> operational" host. However, the manager will be having NAT network during
> this time and hence the user won't be able to access the RHV-M portal GUI
> from an outside client system. So the user has to depend on API using
> curl/sdk/ansible to fix the non-operational host which may not be easy for
> every user. 
> 
> Is there any other way to get the GUI during this time?

It should not be a problem, as I did so from different network already and had no issues with network connectivity.

Comment 7 Simone Tiraboschi 2019-06-06 07:17:17 UTC

We are temporary exposing the engine UI over the host via an ssh port forwarding exactly for that reason.

Comment 8 Nikolai Sednev 2019-06-12 06:13:10 UTC

Did we pushed patches from https://bugzilla.redhat.com/show_bug.cgi?id=1712667 to 4.4.0?

Comment 9 Simone Tiraboschi 2019-06-12 08:39:38 UTC

Yes, we always start from the newest branch and backport to older branches if needed.

Comment 10 Daniel Gur 2019-08-28 13:11:44 UTC

sync2jira

Comment 11 Daniel Gur 2019-08-28 13:15:56 UTC

sync2jira

Comment 14 Nikolai Sednev 2020-04-16 19:51:50 UTC

Works for me, moving to verified.
Tested on:
rhvm-4.4.0-0.31.master.el8ev.noarch
ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux 4.18.0-193.el8.x86_64 #1 SMP Fri Mar 27 14:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Comment 17 Nikolai Sednev 2020-05-18 15:07:43 UTC

Nothing new from my side since comment #14.

Comment 23 errata-xmlrpc 2020-08-04 13:26:25 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246

Note You need to log in before you can comment on or make changes to this bug.