Bug 1702016 - Block moving HE hosts into different Data Centers and make HE host moved to different cluster NonOperational after activation
Summary: Block moving HE hosts into different Data Centers and make HE host moved to d...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.2.8
Hardware: All
OS: Linux
high
high
Target Milestone: ovirt-4.4.3
: ---
Assignee: Artur Socha
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1844787 (view as bug list)
Depends On: 1859586
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-22 17:55 UTC by nijin ashok
Modified: 2023-10-06 18:15 UTC (History)
18 users (show)

Fixed In Version: rhv-4.4.3-8
Doc Type: Bug Fix
Doc Text:
Previously, the Manager allowed adding or migrating hosts configured as self-hosted engine hosts to a data center or cluster other than the one in which the self-hosted engine VM is running, even though all self-hosted engine hosts should be in the same data center and cluster. The hosts' IDs were identical to what they were when initially deployed, causing a Sanlock error. Consequently, the agent failed to start. With this update, an error is raised when adding a new self-hosted engine host or migrating an existing one to a data center or cluster other than the one in which the self-hosted engine is running. To add or migrate a self-hosted engine host to a data center or cluster other than the one in which the self-hosted engine is running, you need to disable the host from being a self-hosted engine host by reinstalling it. Follow these steps in the Administration Portal: 1. Move the host to Maintenance mode. 2. Invoke Reinstall with the *Hosted Engine UNDEPLOY* option selected. If using the REST API, use the `undeploy_hosted_engine` parameter. 3. Edit the host and select the target data center and cluster. 4. Activate the host. For details, see the Administration Guide or REST API Guide.
Clone Of:
Environment:
Last Closed: 2020-11-24 13:09:18 UTC
oVirt Team: Infra
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Screenshot.png (49.00 KB, image/png)
2020-06-18 15:42 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 4098481 0 Troubleshoot None The host is unstable after adding it as an additional hosted engine host 2019-05-01 14:14:03 UTC
Red Hat Product Errata RHSA-2020:5179 0 None None None 2020-11-24 13:10:34 UTC
oVirt gerrit 109017 0 master MERGED engine: cannot add additional HE in another DC 2021-02-21 17:44:43 UTC
oVirt gerrit 109599 0 master MERGED engine: HE host activated only in HE cluster 2021-02-21 17:44:43 UTC
oVirt gerrit 109622 0 master MERGED core: Partial revert of ee3a6dca4d33acbcca5fab7a169b0c6f1c4edefb 2021-02-21 17:44:43 UTC
oVirt gerrit 109623 0 master MERGED core: Add message to block HE host in other cluster 2021-02-21 17:44:44 UTC
oVirt gerrit 109624 0 master MERGED core: Revert of e5585ae50893ceacec269c99db63fadb4c4ca5c6 2021-02-21 17:44:44 UTC

Description nijin ashok 2019-04-22 17:55:33 UTC
Description of problem:

Currently, RHV-M GUI allows adding additional Hosted Engine hosts in another Data Center. As per the HE concept, all hosts should be in the same Data Center. 

Since the UI allows this, the user can incorrectly start the installation on another Data Center. No errors are shown during the deployment and the deployment will be successful and the host status will be "UP". 

But the agent will fail to start because of Sanlock error since the host ID will be the same as that of initially deployed HE hosts.


===
engine=# select * from vds_spm_id_map ;
           storage_pool_id            | vds_spm_id |                vds_id                
--------------------------------------+------------+--------------------------------------
 c63df4be-4ec8-11e9-b4f4-525400919d5d |          1 | b2800c44-53e9-49c5-9e41-cb0414e0457a
 e4e56669-f60e-4fb5-8c04-9322048082a2 |          1 | fd1a4590-6785-497c-bde5-8c623d144fd8


The host in the other Data Center will get the host ID as 1

# cat /etc/ovirt-hosted-engine/hosted-engine.conf |grep -i host_id
host_id=1

Broker log

Listener::ERROR::2019-04-22 23:07:04,996::storage_broker::262::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(start_domain_monitor) Failed to start monitoring domain (sd_uuid=6554cb7b-9438-4d6d-b7d8-db8b2f54dc2b, host_id=1): timeout during domain acquisition

Agent log

2019-04-22 22:59:04 369626 [20430]: s31 lockspace 6554cb7b-9438-4d6d-b7d8-db8b2f54dc2b:1:/dev/6554cb7b-9438-4d6d-b7d8-db8b2f54dc2b/ids:0
2019-04-22 23:01:24 369766 [17576]: s31 delta_acquire host_id 1 busy1 1 1 2459324 0e593e42-d051-4f04-92bc-1085ddbb6f90.localhost.
2019-04-22 23:01:25 369767 [20430]: s31 add_lockspace fail result -262
====

Since the broker is not yet initialized, the "Host getStats" will timeout and hence host will continuously move into Connecting => Activating => UP status with below message in the event log.

===
VDSM 10.74.130.138 command Get Host Statistics failed: Message timeout which can be caused by communication issues
===

For a normal user, it's difficult to understand that the issue is because of deploying hosted engine.
 

Version-Release number of selected component (if applicable):

rhvm-4.2.8.5-0.1.el7ev.noarch

How reproducible:

100%

Steps to Reproduce:

1. In a hosted engine environment, add a new HE host in a different Data Center. 


Actual results:

RHV-M GUI allows adding additional Hosted Engine hosts in other Data Center creating undesirable results

Expected results:

RHV-M GUI should not allow adding the additional HE host in other Data Center.

Additional info:

Comment 12 Sandro Bonazzola 2019-07-11 07:02:13 UTC
Re-targeting to 4.3.6 not being identified as blocker for 4.3.5.

Comment 14 Daniel Gur 2019-08-28 13:14:34 UTC
sync2jira

Comment 15 Daniel Gur 2019-08-28 13:19:37 UTC
sync2jira

Comment 27 Martin Perina 2020-06-18 14:05:17 UTC
*** Bug 1844787 has been marked as a duplicate of this bug. ***

Comment 28 Nikolai Sednev 2020-06-18 15:42:04 UTC
Now you can't add ha-host to different DC as there is no such option available during addition of the host in GUI (Hosted engine tab unavailable).
Please see attached Screenshot from 2020-06-18 18-33-09.png.
In case that after addition you'll try to follow "Installation->reinstall->Hosted Engine->Deploy", you'll receive an error:

"
Operation Canceled
Error while executing action: 

alma03.qa.lab.tlv.redhat.com:
Cannot edit Host. You are using host from data center other than hosted engine VM runs on. In order to start the hosted engine import process, please select host from the same data center or move the host there first."
.

Tested on following components:
Software Version:4.4.1.2-0.10.el8ev
rhvm-appliance-4.4-20200604.0.el8ev.x86_64
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.3-1.el8ev.noarch
Linux 4.18.0-193.9.1.el8_2.x86_64 #1 SMP Sun Jun 14 15:03:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.2 (Ootpa)

Comment 29 Nikolai Sednev 2020-06-18 15:42:56 UTC
Created attachment 1697979 [details]
Screenshot.png

Comment 30 Martin Perina 2020-06-18 16:10:00 UTC
Moving back to POST, current code was reverted, complete solution will be part of 4.4.2 release

Comment 31 Nikolai Sednev 2020-06-18 16:30:04 UTC
What is the fix then?

Comment 36 Nikolai Sednev 2020-10-13 14:48:23 UTC
Tried to add ha-capable host alma07 to none-ha cluster inside the HE's datacenter and received this error:
"Failed to connect Host alma07.qa.lab.tlv.redhat.com to Storage Pool Default
10/13/205:43:57 PM"
"Cannot activate host alma07.qa.lab.tlv.redhat.com in cluster test. Hosts with active hosted engine configuration can be activated only in the same cluster as Hosted Engine VM is running".

Tried to add ha-capable host alma07 to none-HE datacenter and got error:

"
Operation Canceled
Error while executing action: 

alma07.qa.lab.tlv.redhat.com:
Cannot edit Host. You are using host from data center other than hosted engine VM runs on. In order to start the hosted engine import process, please select host from the same data center or move the host there first.
"

Comment 38 Nikolai Sednev 2020-10-13 20:44:26 UTC
1. Moving host with hosted engine configuration into a different DC (other than HE VM is running in) is completely forbidden. - Checked and verified.

2. Moving host with hosted engine configuration into a different cluster (other than HE VM is running in) is allowed, but host cannot be activated successfully -> after activation it will become Non Operational with the relevant message. - Checked and verified.

3. Moving regular host without hosted engine configuration can be successfully moved between DCs/clusters and successfully activated in them. - Checked and verified.

Tested on:

rhvm-4.4.3.6-0.13.el8ev.noarch
ovirt-hosted-engine-setup-2.4.7-2.el8ev.noarch
ovirt-hosted-engine-ha-2.4.5-1.el8ev.noarch
Linux 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.3 (Ootpa)

Comment 39 Steve Goodman 2020-10-27 10:09:54 UTC
Current doc text is helpful, but too long for a release note, so I'm copying it into this comment for now to save it before editing it for the Release Notes:

Cause: 
Currently, RHV-M allows adding additional hosts with hosted engine configuration into different Data Center/Cluster than hosted engine VM is running in. But per the HE concept, all hosts with hosted engine configuration should be in the same Data Center/Cluster. 

Since above flows are allowed, the user can incorrectly start the installation of new host or move existing host to another Data Center/Cluster. No errors are shown during the deployment and the deployment will be successful and the host status will be "UP". 

Consequence: 
The agent will fail to start because of Sanlock error since the host ID will be the same as that of initially deployed HE hosts.


Fix: 

Following flows have been changed:

1. Adding a new host with hosted engine configuration into a different DataCenter (other than the one where hosted engine VM is running in) is not allowed and relevant error is raised

2. Moving an existing host with hosted engine configuration into a different DataCenter (other than the one where hosted engine VM is running in) is not allowed and relevant error is raised

3. Adding a new host with hosted engine configuration into a different Cluster (other than the one where hosted engine VM is running in) is allowed, but such host cannot be activated, upon activation host is moved to NonOperational status and relevant error is raised

4. Moving an existing host with hosted engine configuration into a different Cluster (other than the one where hosted engine VM is running in) is allowed, but such host cannot be activated, upon activation host is moved to NonOperational status and relevant error is raised


Additional notes:

Following steps need to be taken in order to successfully activate host with hosted engine configuration in different DataCenter/Cluster:

1. Move such host to Maintenance 
2. Invoke Reinstall with Hosted Engine UNDEPLOY option selected. For RestApi please use 'undeploy_hosted_engine' param. Detailed documentation here [1]:
3. Edit host and select cluster from desired DC
4. Activate

In order to move above host back to original DataCenter/Cluster and make it available for HE please use following steps:
1) Put HE host to maintenance 
2) Edit host and select cluster from desired DC. Save the change.
3) Chooser Reinstall with HE DEPLOY option selected. For RestApi please use 'deploy_hosted_engine' param. Detailed documentation here [1]:
4) Activate

[1] http://ovirt.github.io/ovirt-engine-api-model/4.4/#services/host/methods/install

Comment 43 errata-xmlrpc 2020-11-24 13:09:18 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5179


Note You need to log in before you can comment on or make changes to this bug.