Description of problem: Currently, RHV-M GUI allows adding additional Hosted Engine hosts in another Data Center. As per the HE concept, all hosts should be in the same Data Center. Since the UI allows this, the user can incorrectly start the installation on another Data Center. No errors are shown during the deployment and the deployment will be successful and the host status will be "UP". But the agent will fail to start because of Sanlock error since the host ID will be the same as that of initially deployed HE hosts. === engine=# select * from vds_spm_id_map ; storage_pool_id | vds_spm_id | vds_id --------------------------------------+------------+-------------------------------------- c63df4be-4ec8-11e9-b4f4-525400919d5d | 1 | b2800c44-53e9-49c5-9e41-cb0414e0457a e4e56669-f60e-4fb5-8c04-9322048082a2 | 1 | fd1a4590-6785-497c-bde5-8c623d144fd8 The host in the other Data Center will get the host ID as 1 # cat /etc/ovirt-hosted-engine/hosted-engine.conf |grep -i host_id host_id=1 Broker log Listener::ERROR::2019-04-22 23:07:04,996::storage_broker::262::ovirt_hosted_engine_ha.broker.storage_broker.StorageBroker::(start_domain_monitor) Failed to start monitoring domain (sd_uuid=6554cb7b-9438-4d6d-b7d8-db8b2f54dc2b, host_id=1): timeout during domain acquisition Agent log 2019-04-22 22:59:04 369626 [20430]: s31 lockspace 6554cb7b-9438-4d6d-b7d8-db8b2f54dc2b:1:/dev/6554cb7b-9438-4d6d-b7d8-db8b2f54dc2b/ids:0 2019-04-22 23:01:24 369766 [17576]: s31 delta_acquire host_id 1 busy1 1 1 2459324 0e593e42-d051-4f04-92bc-1085ddbb6f90.localhost. 2019-04-22 23:01:25 369767 [20430]: s31 add_lockspace fail result -262 ==== Since the broker is not yet initialized, the "Host getStats" will timeout and hence host will continuously move into Connecting => Activating => UP status with below message in the event log. === VDSM 10.74.130.138 command Get Host Statistics failed: Message timeout which can be caused by communication issues === For a normal user, it's difficult to understand that the issue is because of deploying hosted engine. Version-Release number of selected component (if applicable): rhvm-4.2.8.5-0.1.el7ev.noarch How reproducible: 100% Steps to Reproduce: 1. In a hosted engine environment, add a new HE host in a different Data Center. Actual results: RHV-M GUI allows adding additional Hosted Engine hosts in other Data Center creating undesirable results Expected results: RHV-M GUI should not allow adding the additional HE host in other Data Center. Additional info:
Re-targeting to 4.3.6 not being identified as blocker for 4.3.5.
sync2jira
*** Bug 1844787 has been marked as a duplicate of this bug. ***
Now you can't add ha-host to different DC as there is no such option available during addition of the host in GUI (Hosted engine tab unavailable). Please see attached Screenshot from 2020-06-18 18-33-09.png. In case that after addition you'll try to follow "Installation->reinstall->Hosted Engine->Deploy", you'll receive an error: " Operation Canceled Error while executing action: alma03.qa.lab.tlv.redhat.com: Cannot edit Host. You are using host from data center other than hosted engine VM runs on. In order to start the hosted engine import process, please select host from the same data center or move the host there first." . Tested on following components: Software Version:4.4.1.2-0.10.el8ev rhvm-appliance-4.4-20200604.0.el8ev.x86_64 ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch ovirt-hosted-engine-ha-2.4.3-1.el8ev.noarch Linux 4.18.0-193.9.1.el8_2.x86_64 #1 SMP Sun Jun 14 15:03:05 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.2 (Ootpa)
Created attachment 1697979 [details] Screenshot.png
Moving back to POST, current code was reverted, complete solution will be part of 4.4.2 release
What is the fix then?
Tried to add ha-capable host alma07 to none-ha cluster inside the HE's datacenter and received this error: "Failed to connect Host alma07.qa.lab.tlv.redhat.com to Storage Pool Default 10/13/205:43:57 PM" "Cannot activate host alma07.qa.lab.tlv.redhat.com in cluster test. Hosts with active hosted engine configuration can be activated only in the same cluster as Hosted Engine VM is running". Tried to add ha-capable host alma07 to none-HE datacenter and got error: " Operation Canceled Error while executing action: alma07.qa.lab.tlv.redhat.com: Cannot edit Host. You are using host from data center other than hosted engine VM runs on. In order to start the hosted engine import process, please select host from the same data center or move the host there first. "
1. Moving host with hosted engine configuration into a different DC (other than HE VM is running in) is completely forbidden. - Checked and verified. 2. Moving host with hosted engine configuration into a different cluster (other than HE VM is running in) is allowed, but host cannot be activated successfully -> after activation it will become Non Operational with the relevant message. - Checked and verified. 3. Moving regular host without hosted engine configuration can be successfully moved between DCs/clusters and successfully activated in them. - Checked and verified. Tested on: rhvm-4.4.3.6-0.13.el8ev.noarch ovirt-hosted-engine-setup-2.4.7-2.el8ev.noarch ovirt-hosted-engine-ha-2.4.5-1.el8ev.noarch Linux 4.18.0-240.el8.x86_64 #1 SMP Wed Sep 23 05:13:10 EDT 2020 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux release 8.3 (Ootpa)
Current doc text is helpful, but too long for a release note, so I'm copying it into this comment for now to save it before editing it for the Release Notes: Cause: Currently, RHV-M allows adding additional hosts with hosted engine configuration into different Data Center/Cluster than hosted engine VM is running in. But per the HE concept, all hosts with hosted engine configuration should be in the same Data Center/Cluster. Since above flows are allowed, the user can incorrectly start the installation of new host or move existing host to another Data Center/Cluster. No errors are shown during the deployment and the deployment will be successful and the host status will be "UP". Consequence: The agent will fail to start because of Sanlock error since the host ID will be the same as that of initially deployed HE hosts. Fix: Following flows have been changed: 1. Adding a new host with hosted engine configuration into a different DataCenter (other than the one where hosted engine VM is running in) is not allowed and relevant error is raised 2. Moving an existing host with hosted engine configuration into a different DataCenter (other than the one where hosted engine VM is running in) is not allowed and relevant error is raised 3. Adding a new host with hosted engine configuration into a different Cluster (other than the one where hosted engine VM is running in) is allowed, but such host cannot be activated, upon activation host is moved to NonOperational status and relevant error is raised 4. Moving an existing host with hosted engine configuration into a different Cluster (other than the one where hosted engine VM is running in) is allowed, but such host cannot be activated, upon activation host is moved to NonOperational status and relevant error is raised Additional notes: Following steps need to be taken in order to successfully activate host with hosted engine configuration in different DataCenter/Cluster: 1. Move such host to Maintenance 2. Invoke Reinstall with Hosted Engine UNDEPLOY option selected. For RestApi please use 'undeploy_hosted_engine' param. Detailed documentation here [1]: 3. Edit host and select cluster from desired DC 4. Activate In order to move above host back to original DataCenter/Cluster and make it available for HE please use following steps: 1) Put HE host to maintenance 2) Edit host and select cluster from desired DC. Save the change. 3) Chooser Reinstall with HE DEPLOY option selected. For RestApi please use 'deploy_hosted_engine' param. Detailed documentation here [1]: 4) Activate [1] http://ovirt.github.io/ovirt-engine-api-model/4.4/#services/host/methods/install
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Low: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5179