Bug 1867198 - Self-hosted engine restore process fails when restore host networks are configured as 'Required'
Summary: Self-hosted engine restore process fails when restore host networks are confi...
Keywords:
Status: CLOSED DUPLICATE of bug 1695523
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Documentation
Version: 2.4.5
Hardware: Unspecified
OS: Unspecified
medium
medium vote
Target Milestone: ovirt-4.4.5
: ---
Assignee: rhev-docs@redhat.com
QA Contact: Guilherme Santos
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-07 16:46 UTC by Francisco Garcia
Modified: 2021-11-12 10:19 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-10-14 07:36:13 UTC
oVirt Team: Integration


Attachments (Terms of Use)

Description Francisco Garcia 2020-08-07 16:46:52 UTC
Description of problem:

When performing a Self-Hosted Engine restore, the installer doesn't way too long to check if the Hosted Engine restore host has become "Up". If the cluster where this restore is happening has 'Required' networks, there is no prompt for the user to either a) configure those networks manually via the temporary Manager in https://hypervisor:6900/ovirt-engine or b) documentation to warn the user and configure the networks as non-Required prior to attempting a SHE upgrade.

Earlier (non-ansible) versions of the installer waited for the user to acknowledge and fix this scenario prior to continuing with it.

Version-Release number of selected component (if applicable):


ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch


How reproducible:

Always


Steps to Reproduce:
1. Have a datacenter/cluster with "required" networks.
2. Perform a SHE restore process with hosted-engine --deploy --restore-from-file=backup.tar.bz2
3. See log:



2020-08-07 16:19:51,242+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 The bootstrap engine is temporary accessible over https://rhevh2.example.org:6900/ovirt-engine/
2020-08-07 16:19:53,046+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Detect VLAN ID]
2020-08-07 16:19:54,749+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 changed: [localhost]
2020-08-07 16:19:59,858+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Set Engine public key as authorized key without validating the TLS/SSL certificates]
2020-08-07 16:20:01,962+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 changed: [localhost]
2020-08-07 16:20:03,666+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : include_tasks]
2020-08-07 16:20:05,269+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:20:07,172+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials]
2020-08-07 16:20:09,377+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:20:11,080+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Ensure that the target datacenter is present]
2020-08-07 16:20:13,585+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:20:15,288+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Ensure that the target cluster is present in the target datacenter]
2020-08-07 16:20:17,492+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:20:19,295+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Check actual cluster location]
2020-08-07 16:20:20,899+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 skipping: [localhost]
2020-08-07 16:20:22,602+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Enable GlusterFS at cluster level]
2020-08-07 16:20:24,205+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 skipping: [localhost]
2020-08-07 16:20:26,008+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Set VLAN ID at datacenter level]
2020-08-07 16:20:28,112+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:20:29,815+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Get active list of active firewalld zones]
2020-08-07 16:20:31,919+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 changed: [localhost]
2020-08-07 16:20:33,622+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Configure libvirt firewalld zone]
2020-08-07 16:20:37,630+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 changed: [localhost]
2020-08-07 16:20:39,333+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Add host]
2020-08-07 16:20:41,337+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 changed: [localhost]
2020-08-07 16:20:44,743+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 skipping: [localhost]
2020-08-07 16:20:46,648+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : include_tasks]
2020-08-07 16:20:48,351+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 skipping: [localhost]
2020-08-07 16:20:50,155+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : include_tasks]
2020-08-07 16:20:51,758+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:20:53,662+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Always revoke the SSO token]
2020-08-07 16:20:57,470+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : include_tasks]
2020-08-07 16:20:59,173+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:21:01,177+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Obtain SSO token using username/password credentials]
2020-08-07 16:21:03,181+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:21:04,985+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Wait for the host to be up]
2020-08-07 16:22:50,363+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:22:56,073+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Notify the user about a failure]
2020-08-07 16:22:57,877+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 skipping: [localhost]
2020-08-07 16:22:59,880+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : set_fact]
2020-08-07 16:23:01,683+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:23:03,586+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Collect error events from the Engine]
2020-08-07 16:23:07,492+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 ok: [localhost]
2020-08-07 16:23:13,601+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Generate the error message from the engine events]
2020-08-07 16:23:25,919+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Fail with error description]
2020-08-07 16:23:31,827+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 skipping: [localhost]
2020-08-07 16:23:38,136+0000 INFO otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:109 TASK [ovirt.hosted_engine_setup : Fail with generic error]
2020-08-07 16:23:44,045+0000 ERROR otopi.ovirt_hosted_engine_setup.ansible_utils ansible_utils._process_output:107 fatal: [localhost]: FAILED! => {"changed": false, "msg": "The host has been set in non_operational status, please check engine logs, more info can be found in the engine logs, fix accordingly and re-deploy."}

Comment 1 Francisco Garcia 2020-08-08 13:41:36 UTC
This seems to be a duplicate of https://bugzilla.redhat.com/show_bug.cgi?id=1686575 ; I haven't been able to locate a document describing how to stop the engine deployment so some time is allotted to manually fix the new host configuration.

Comment 2 Sandro Bonazzola 2020-08-27 07:04:59 UTC
Upstream documentation is here: https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/README.md#make-changes-in-the-engine-vm-during-the-deployment
We need to update our documentation downstream to include it.

Comment 3 Steve Goodman 2020-08-27 07:25:50 UTC
So should this be reassigned to a writer?

Comment 4 Sandro Bonazzola 2020-09-03 07:39:41 UTC
(In reply to Steve Goodman from comment #3)
> So should this be reassigned to a writer?

yes

Comment 5 Steve Goodman 2020-12-03 08:22:56 UTC
The upstream documentation says:

----
To make manual adjustments you can set the variable he_pause_host to true. This will pause the deployment after the engine has been setup and create a lock-file at /tmp that ends with _he_setup_lock on the machine the role was executed on. The deployment will continue after deleting the lock-file, or after 24 hours ( if the lock-file hasn't been removed ).

In order to proceed with the deployment, before deleting the lock-file, make sure that the host is on 'up' state at the engine's URL.

Both of the lock-file path and the engine's URL will be presented during the role execution.
----

On which machine do you set the variable he_pause_host to true? Is this an environment variable? If not, in which file does this variable appear, and what is the exact line that should appear?

Comment 6 Sandro Bonazzola 2020-12-16 08:22:17 UTC
(In reply to Steve Goodman from comment #5)

> On which machine do you set the variable he_pause_host to true? Is this an
> environment variable? If not, in which file does this variable appear, and
> what is the exact line that should appear?

Redirecting question to Asaf

Comment 7 Asaf Rachmani 2020-12-16 10:19:25 UTC
(In reply to Steve Goodman from comment #5)
> On which machine do you set the variable he_pause_host to true? Is this an
> environment variable? If not, in which file does this variable appear, and
> what is the exact line that should appear?

There are a few ways to set this variable, you can find more details here [1].
There is no specific machine to set this variable on, you can run the playbook locally and set the var in the local machine (for example on the RHVH), or remotely (any machine) and set the var on the same machine you run the playbook from.
The exact line that should appear (in a file for example): "he_pause_host": true

[1] https://docs.ansible.com/ansible/latest/user_guide/playbooks_variables.html#defining-variables-at-runtime

Comment 8 Sandro Bonazzola 2021-06-10 06:36:18 UTC
Re-targeting to 4.4.7 since oVirt 4.4.6 was released.

Comment 9 Sandro Bonazzola 2021-09-24 07:54:40 UTC
This issue is past ovirt-4.4.8-1 development cycle. Moving to ovirt-4.4.9.

If you believe this issue should be closed or moved back to 4.4.8-1 please do so.

Otherwise, I would like to ask for updating tickets ahead of the development cycle deadline (release date).


Note You need to log in before you can comment on or make changes to this bug.