Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1686575

Summary: hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file.
Product: Red Hat Enterprise Virtualization Manager Reporter: Ameya Charekar <achareka>
Component: ovirt-hosted-engine-setupAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: high Docs Contact:
Priority: medium    
Version: 4.2.8-3CC: emarcus, fkust, lleistne, lsurette, mgoldboi, mmartinv, nashok, nsednev, rdlugyhe, sirao, stirabos
Target Milestone: ovirt-4.4.0Keywords: Triaged, ZStream
Target Release: 4.4.0Flags: emarcus: needinfo-
lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.3.9-1.el7ev Doc Type: Bug Fix
Doc Text:
Previously, the self-hosted engine high availability host’s management network was configured during deployment. The VDSM took over the Network Manager and configured the selected network interface during initial deployment, while the Network Manager remained disabled. During restore, there was no option to attach additional (non-default) networks, and the restore process failed because the high-availability host had no connectivity to networks previously configured by the user that were listed in the backup file. In this release, the user can pause the restore process, manually add the required networks, and resume the restore process to completion.
Story Points: ---
Clone Of:
: 1712667 (view as bug list) Environment:
Last Closed: 2020-08-04 13:26:25 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1795672, 1811734    
Bug Blocks: 1712667    

Description Ameya Charekar 2019-03-07 18:04:44 UTC
Description of problem:
hosted-engine deploy (restore-from-file) fails if any non-management logical network is marked as required in backup file as host is marked as non-operational due to missing required network.

Errors from deployment logs:-
~~~
2019-03-07 20:33:50,711+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, fix accordingly and re-deploy.\n"}

2019-03-07 20:35:00,862+0530 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:98 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The system may not be provisioned according to the playbook results: please check the logs for the issue, fix accordingly or re-deploy from scratch.\n"}
~~~

Errors from engine logs:-
~~~
2019-03-07 20:33:42,342+05 ERROR [org.ovirt.engine.core.bll.SetNonOperationalVdsCommand] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] Host '<hostname>' is set to Non-Operational, it is missing the following networks: 'test'
2019-03-07 20:33:42,397+05 WARN  [org.ovirt.engine.core.dal.dbbroker.auditloghandling.AuditLogDirector] (EE-ManagedThreadFactory-engine-Thread-16) [6fad6d2a] EVENT_ID: VDS_SET_NONOPERATIONAL_NETWORK(519), Host <hostname> does not comply with the cluster Default networks, the following networks are missing on host: 'test'
~~~


Version-Release number of selected component (if applicable):

ovirt-hosted-engine-setup-2.2.34-1.el7ev.noarch
ovirt-hosted-engine-ha-2.2.19-1.el7ev.noarch
rhvm-4.2.8.5-0.1.el7ev.noarch

How reproducible:
Always


Steps to Reproduce:
1. Have a backup_file required non-management logical networks.
2. hosted-engine --deploy --restore-from-file=backup/file_name
3. 

Actual results:
Deployment fails.

Expected results:
It should work even with required non-management logical networks

Additional info:

Comment 2 nijin ashok 2019-05-02 15:35:45 UTC
Setting the severity of this bug to high as this will be a showstopper when a user is recovering the hosted engine after a data loss or corruption where the user cannot go back to old setup and mark the network as not required. This will be a production down scenario.

A workaround will be to use hook enginevm_after_engine_setup to set the network as not required before adding the host.


/usr/share/ovirt-hosted-engine-setup/ansible/hooks/enginevm_after_engine_setup/fix_network.yml

- include_tasks: auth_sso.yml
- name: Wait for the engine to reach a stable condition
  wait_for: timeout=300
- name: fix network
  ovirt_network:
     auth: "{{ ovirt_auth }}"
     name: "{{ item }}"
     data_center: Default
     clusters:
        - name: Default
          required: False
  with_items:
     - require_network_1
     - require_network_2

Comment 4 nijin ashok 2019-06-06 02:49:02 UTC
The solution posted here is to give time for the user to fix the "non operational" host. However, the manager will be having NAT network during this time and hence the user won't be able to access the RHV-M portal GUI from an outside client system. So the user has to depend on API using curl/sdk/ansible to fix the non-operational host which may not be easy for every user. 

Is there any other way to get the GUI during this time?

Comment 6 Nikolai Sednev 2019-06-06 06:45:02 UTC
(In reply to nijin ashok from comment #4)
> The solution posted here is to give time for the user to fix the "non
> operational" host. However, the manager will be having NAT network during
> this time and hence the user won't be able to access the RHV-M portal GUI
> from an outside client system. So the user has to depend on API using
> curl/sdk/ansible to fix the non-operational host which may not be easy for
> every user. 
> 
> Is there any other way to get the GUI during this time?

It should not be a problem, as I did so from different network already and had no issues with network connectivity.

Comment 7 Simone Tiraboschi 2019-06-06 07:17:17 UTC
We are temporary exposing the engine UI over the host via an ssh port forwarding exactly for that reason.

Comment 8 Nikolai Sednev 2019-06-12 06:13:10 UTC
Did we pushed patches from https://bugzilla.redhat.com/show_bug.cgi?id=1712667 to 4.4.0?

Comment 9 Simone Tiraboschi 2019-06-12 08:39:38 UTC
Yes, we always start from the newest branch and backport to older branches if needed.

Comment 10 Daniel Gur 2019-08-28 13:11:44 UTC
sync2jira

Comment 11 Daniel Gur 2019-08-28 13:15:56 UTC
sync2jira

Comment 14 Nikolai Sednev 2020-04-16 19:51:50 UTC
Works for me, moving to verified.
Tested on:
rhvm-4.4.0-0.31.master.el8ev.noarch
ovirt-hosted-engine-ha-2.4.2-1.el8ev.noarch
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
rhvm-appliance.x86_64 2:4.4-20200403.0.el8ev
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux 4.18.0-193.el8.x86_64 #1 SMP Fri Mar 27 14:35:58 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Comment 17 Nikolai Sednev 2020-05-18 15:07:43 UTC
Nothing new from my side since comment #14.

Comment 23 errata-xmlrpc 2020-08-04 13:26:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV RHEL Host (ovirt-host) 4.4), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:3246