Bug 1417518 - [HE] high availability compromised due to duplicate spm id
Summary: [HE] high availability compromised due to duplicate spm id
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-engine
Version: 4.0.6
Hardware: x86_64
OS: Linux
high
urgent
Target Milestone: ovirt-4.1.1-1
: ---
Assignee: Denis Chaplygin
QA Contact: Artyom
URL:
Whiteboard: integration
Depends On:
Blocks: 1235200 1422486 1431635
TreeView+ depends on / blocked
 
Reported: 2017-01-30 03:38 UTC by Germano Veit Michel
Modified: 2020-03-11 15:44 UTC (History)
16 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Self-hosted engine always uses an SPM ID of 1 during installation of the first self-hosted engine host, without checking database settings. This release adds options to change the database during the restore process. For disaster recovery, the --he-remove-hosts option has been added so that all hosts with SPM_ID=1 are updated and a different SPM ID assigned. For bare metal to self-hosted engine migration, a new engine-migrate-he.py script is provided. This script should be called before migration, and supplied with the Manager REST API login/password/endpoint and path to CA certificate. Hosts in the selected data center with SPM_ID=1 will be put into Maintenance mode, so they can accept the new ID safely. Migration can then continue as usual, using the --he-remove-hosts option.
Clone Of:
: 1422486 (view as bug list)
Environment:
Last Closed: 2017-04-25 00:53:20 UTC
oVirt Team: Integration
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 2897821 0 None None None 2017-02-01 06:28:34 UTC
Red Hat Knowledge Base (Solution) 2981731 0 None None None 2017-03-31 18:52:44 UTC
Red Hat Product Errata RHEA-2017:0997 0 normal SHIPPED_LIVE Red Hat Virtualization Manager (ovirt-engine) 4.1 GA 2017-04-18 20:11:26 UTC
oVirt gerrit 71393 0 None MERGED he: Ensures that there will be no spm_id=1 host after restore. 2021-01-24 12:39:12 UTC
oVirt gerrit 72023 0 master MERGED search: Add spm_id as a searchable field for Host 2021-01-24 12:39:12 UTC
oVirt gerrit 72037 0 master MERGED he: Bare metal migration helper script added. 2021-01-24 12:39:12 UTC
oVirt gerrit 73033 0 ovirt-engine-4.0 ABANDONED search: Add spm_id as a searchable field for Host 2021-01-24 12:39:12 UTC
oVirt gerrit 73034 0 ovirt-engine-4.0 ABANDONED he: Bare metal migration helper script added. 2021-01-24 12:39:12 UTC
oVirt gerrit 73035 0 ovirt-engine-4.1 ABANDONED search: Add spm_id as a searchable field for Host 2021-01-24 12:39:13 UTC
oVirt gerrit 73036 0 ovirt-engine-4.1 ABANDONED he: Bare metal migration helper script added. 2021-01-24 12:39:13 UTC
oVirt gerrit 73236 0 ovirt-engine-4.1 MERGED search: Add spm_id as a searchable field for Host 2021-01-24 12:39:13 UTC
oVirt gerrit 73238 0 ovirt-engine-4.1 MERGED he: Bare metal migration helper script added. 2021-01-24 12:39:13 UTC
oVirt gerrit 73241 0 ovirt-engine-4.1.1.z MERGED search: Add spm_id as a searchable field for Host 2021-01-24 12:39:13 UTC
oVirt gerrit 73242 0 ovirt-engine-4.1.1.z MERGED he: Bare metal migration helper script added. 2021-01-24 12:39:14 UTC
oVirt gerrit 73993 0 master MERGED he: Notify user to redeploy HE hosts after recovery. 2021-01-24 12:39:14 UTC
oVirt gerrit 73996 0 ovirt-engine-4.1 MERGED he: Notify user to redeploy HE hosts after recovery. 2021-01-24 12:39:14 UTC
oVirt gerrit 74670 0 ovirt-engine-4.1.1.z MERGED he: Notify user to redeploy HE hosts after recovery. 2021-01-24 12:39:14 UTC

Description Germano Veit Michel 2017-01-30 03:38:21 UTC
Description of problem:

A Hosted-Engine and a non HE Host can and up with the same SPM id.
Hosted-Engine won't start, whole environment is down as the HE host can't acquire it's ids on the HE SD.

The way this happens is basically when deploying a new Host with HE (for disaster recovery or migration to HE).

The new host for re-deploying/migrating to HE automatically assumes spm id 1 since the HE domain is clean. However, a previous host is set with id 1 in the DB (it can be running or not). The HE deployment only cares about ids in the HE SD and has no idea about any other running SD/Master. It can't guess, so it selects ids 1.

Once the Engine goes up and the HE SD is added to the DB and the other host is activated, that older host now tries to grab a lock in the HE SD, and it fails, but all seems good and runs fine for ages. But it's definitly NOT good.

Now reboot the HE Host (for any reason), the other host will grab the lock for id 1 on the HE SD and never release it again. Now the HE won't start and the environment is down, it looks like corrupted lockspaces but it isn't, cleaning up the lockspaces won't help and one needs to figure out two hosts are fighting for the same ID, not that simple.

Version-Release number of selected component (if applicable):
rhevm-4.0.6.3-0.1.el7ev.noarch
vdsm-4.18.21-1.el7ev.x86_64
sanlock-3.4.0-1.el7.x86_64
ovirt-hosted-engine-ha-2.0.6-1.el7ev.noarch
ovirt-hosted-engine-setup-2.0.4.1-2.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
I see basically two ways to hit this, there might be more.

A: Re-deploying HE (i.e. disaster recovery)
B: Moving from Standalone RHV-M to HE

B could be avoidable via documentation, but for A I don't think so.

So, to illustrate it, please see how to hit the problem with way B (Moving from Standalone RHV-M to HE).

1. Previously Running RHV Environment
   * Bare-Metal RHV-M
   * 1 Host (vds c0976879)
   * 1 Storage Domain (SD dc7e6fad)
   * Host uses spm id 1, see:

              storage_pool_id            | vds_spm_id |                vds_id                
   --------------------------------------+------------+--------------------------------------
    588e8d50-023f-0158-0292-0000000002f3 |          1 | c0976879-6165-4545-b67c-1bd2361112d5

   * We can see sanlock on vds c0976879 taking id 1:
   s dc7e6fad-c00c-4aeb-8c93-f2887d764019:1:/rhev/data-center/mnt/<IP>\:_storage_storage/dc7e6fad-c00c-4aeb-8c93-f2887d764019/dom_md/ids:0

2. To migrate to HE, follow https://access.redhat.com/documentation/en/red-hat-virtualization/4.0/paged/self-hosted-engine-guide/chapter-4-migrating-from-bare-metal-to-a-rhel-based-self-hosted-environment
   * Note it says one of the options is:
     - Prepare a new host with the ovirt-hosted-engine-setup package installed.      
   * Now we have Host 215f48bf, which runs HE
   * Host 215f48bf got spm id 1 on the HE SD (70345242), see:
   s 70345242-bcad-4e0b-ba2e-4b761e8132a3:1:/rhev/data-center/mnt/<IP>\:_storage_hosted/70345242-bcad-4e0b-ba2e-4b761e8132a3/dom_md/ids:0

3. Once the Migration is Finished, and the new engine activates both hosts all seem fine, but it's not:
   * The original host c0976879 just holds id 1 on the original SD and fails to get id 1 on the HE SD 70345242
   * The HE host 215f48bf holds id 1 on the HE SD 70345242 and id 2 on the original SD dc7e6fad. 
In more detail:
   * vds c0976879 still has id 1 on SD dc7e6fad:
   s dc7e6fad-c00c-4aeb-8c93-f2887d764019:1:/rhev/data-center/mnt/<IP>\:_storage_storage/dc7e6fad-c00c-4aeb-8c93-f2887d764019/dom_md/ids:0
[1]* vds c0976879 upon activation failed to get id 1 on SD 70345242 (HE):
   * vds 215f48bf (HE) got id 2 on SD dc7e6fad:
   s dc7e6fad-c00c-4aeb-8c93-f2887d764019:2:/rhev/data-center/mnt/<IP>\:_storage_storage/dc7e6fad-c00c-4aeb-8c93-f2887d764019/dom_md/ids:0
[2]* vds 215f48bf (HE) got id 1 on SD (HE):
   s 70345242-bcad-4e0b-ba2e-4b761e8132a3:1:/rhev/data-center/mnt/<IP>\:_storage_hosted/70345242-bcad-4e0b-ba2e-4b761e8132a3/dom_md/ids:0
   * And we see the HE Host should be using id 2:
              storage_pool_id            | vds_spm_id |                vds_id                
   --------------------------------------+------------+--------------------------------------
    588e8d50-023f-0158-0292-0000000002f3 |          1 | c0976879-6165-4545-b67c-1bd2361112d5
    588e8d50-023f-0158-0292-0000000002f3 |          2 | 215f48bf-5162-46c6-9e6b-bdb2465d1e88

4. It's all crossed, things are not good. Then once we reboot the HE Host:
   * vds c0976879 still has id 1 on SD dc7e6fad
   * vds c0976879 finally gets id 1 on SD (HE).

5. Once the HE Host comes up again, it can't get it's ids lock, all fails. HE is down.
   * HE host 215f48bf can't get id 1 as it's being hold by the other host c0976879
   * it still tried to get spm id 1, due to hosted-engine.conf(?)
     host_id=1

And it never corrects itself without manual intervention. Depending on the order the Hosts are rebooted/shutdown the problem may hit the HE Host (bad outcomes) or the non HE Host (not too bad). But the problem remains, that the HE Host is set to use id 2 in the DB and uses id 1 in the HE SD.

Apparently we need some extra logic to handle this. Should the HE really use HE host_id as a key to the ids lockspace (SD host_id)?

Actual results:
[1] Shouldn't this produce an error to warn the user something is wrong? How can a Host activate and show up if it can't even acquire ids lock?
[2] Once activated it still has ID 1, maybe the trick is to switch it to id 2 at the final stages of he deploy (add host)?

Expected results:
No hosts fighting for the same id, Hosted-Engine VM able to start.

Comment 2 Simone Tiraboschi 2017-01-30 09:18:49 UTC
Here I see just two options:
1. always filter out (or change the id) the host with spm_id=1 at restore time; this should be pretty safe 
2. start hosted-engine-setup with spm_id=1; deploy as usual and add the host via engine API. Since we have another host with spm_id=1 the engine will choose a different spm_id so we need to get it somehow (REST API, VDSM, sanlock ?) and change the hosted-engine configuration before the next run.
Currently nothing is removing the lock on exit since we just assume that the engine is going to refresh for id=1 so we have also to remove the lock but if we do it when the engine VM is running it would probably be killed and if we wait for the engine VM to be down, nothing is preventing the engine to start the autoimport process since another storage domain could be in the restored DB as well.

Comment 3 Germano Veit Michel 2017-01-31 00:17:23 UTC
Simone,

I'm yet to write the KCS on how to fix this. In your opinion, what would be the best course of action to avoid HostedEngine VM getting killed?

I assume we need to get host_id in hosted-engine.conf in sync with vds_spm_id_map. So what about this:

1) Get the vds_spm_id_map
2) HE maintenance mode
3) Shutdown/Migrate HE, make it release the ids lock (if single host can't move to maintenance)
3) Adjust hosted-engine.conf
4) Cleanup metadata slots of old IDs
5) Reboot

Any better ideas?

I was thinking, one can also hit this when deploying additional hosts right? Just use a host_id which is already used in spm_vds_id_map. No need to do all that I did on the reproduction steps. Right?

Thank you

Comment 4 Simone Tiraboschi 2017-01-31 09:04:08 UTC
(In reply to Germano Veit Michel from comment #3)
> I assume we need to get host_id in hosted-engine.conf in sync with
> vds_spm_id_map. So what about this:
> 
> 1) Get the vds_spm_id_map
> 2) HE maintenance mode
> 3) Shutdown/Migrate HE, make it release the ids lock (if single host can't
> move to maintenance)
> 3) Adjust hosted-engine.conf
> 4) Cleanup metadata slots of old IDs
> 5) Reboot
> 
> Any better ideas?

This is one option; the other is to change the spm_id of the conflicting hosts in the DB and reboot the host not involved in hosted-engine.

> I was thinking, one can also hit this when deploying additional hosts right?
> Just use a host_id which is already used in spm_vds_id_map. No need to do
> all that I did on the reproduction steps. Right?

We deprecated (in 4.0) and removed the possibility to deploy additional hosted engine hosts from the CLI; now the user could just add additional hosted-engine hosts from the engine (webui or REST APIs): so we are sure that the spm_id will be coherent between the engine and the local configuration of the hosted-engine hosts.

Comment 5 Sandro Bonazzola 2017-02-07 17:23:09 UTC
Simone, can we close this current release provided we deprecated CLI in 4.0 and this doesn't happen using web ui?

Comment 6 Germano Veit Michel 2017-02-07 23:13:45 UTC
(In reply to Sandro Bonazzola from comment #5)
> Simone, can we close this current release provided we deprecated CLI in 4.0
> and this doesn't happen using web ui?

Hi Sandro,

As stated in comment #0, this can easily be hit in two scenarios:

A: Re-deploying HE (i.e. disaster recovery)
B: Moving from Standalone RHV-M to HE

And none of them is related to deploying additional hosts via the Web UI.

Is there another BZ tracking the root cause of this (inappropriate spm id for the initial host)?

Comment 7 Simone Tiraboschi 2017-02-08 11:40:15 UTC
(In reply to Germano Veit Michel from comment #6)
> Is there another BZ tracking the root cause of this (inappropriate spm id
> for the initial host)?

No, we can work on this bug.

Comment 9 Simone Tiraboschi 2017-03-07 16:00:32 UTC
We found another problematic case here:
this is not just about host with spm_id=1 but also the spm host if it is one of the hosted-engine hosts we are going to remove in a HE to HE migration through backup/restore if it's not the host used to run hosted-engine-setup on the restore side.

Example:
1. hosted-engine host with spm_id=2 and it was the spm host
2. we set host with spm_id=1 in maintenance mode from the migration helper script, HE host with spm_id=2 is still up and still the spm
3. user took a backup
4. user start hosted-engine --deploy and restore the backup filtering out all the hosted-engine hosts including the spm one but it's still active
5. the restored engine fails electing another spm host since the old spm is still active; this happens till the old host is live or it's added to the engine

The migration helper script should include also a check for that case.

Comment 10 Artyom 2017-03-07 16:20:07 UTC
We will need the backport to 4.0 for the additional patch.
I verified this bug for 4.0, but I did not encounter this problem, my hosted engine host had spm_id=1.

Comment 11 Denis Chaplygin 2017-03-08 05:21:52 UTC
Migration script have 'migration' in it's name for reason :-) It is supposed to be used only for 'bare metal engine to hosted engine migration'. Therefore, if you are already on hosted engine, you should not use that migration script.

As for problematic case, mentioned above, what is the use case of removing hosts from the database and at the same time keeping them physically in cluster? 

If you just restore your database because of data corruption etc, but HE cluster is still alive, you should not remove hosts, just operate on the engine. 

In case of disaster recovery, when HE cluster is lost, why should you keep cluster members running? 

Finally, why can't you just add remaining cluster members to the engine after restoration?

Comment 12 Simone Tiraboschi 2017-03-08 08:46:11 UTC
(In reply to Denis Chaplygin from comment #11)
> Migration script have 'migration' in it's name for reason :-) It is supposed
> to be used only for 'bare metal engine to hosted engine migration'.
> Therefore, if you are already on hosted engine, you should not use that
> migration script.

Migrating from an NFS based hosted-engine deployed to an iSCSI one or from an iSCSI LUN to one on different target and so on (just because you have to decommission the storage you are using for hosted-engine) without the need to power-down the whole datacenter. We already have different users asking for this; customers are also asking to backport to 4.0.

> As for problematic case, mentioned above, what is the use case of removing
> hosts from the database and at the same time keeping them physically in
> cluster? 

It's here: https://bugzilla.redhat.com/show_bug.cgi?id=1235200
The point is that the old hosted-engine hosts have a local configuration file that points to the hosted-engine storage domain to bootstrap it on a cold boot from any host: this has to be updated pointing to the new hosted-engine storage domain, the hosted-engine metadata and lockspace volume on the hosted-engine storage domain have to be updated as well.
The best option is to ask the user to redeploy all the hosted-engine hosts on migration but manually doing that from the engine is really error prone since we have no indication of which hosts points to the new hosted-engine storage domain versus the ones that are still pointing to the previous one.

There is no need to do that on live hosted-engine hosts with running VMs but asking to manually poweroff the hosts could be error prone as well. Probably the simplest option is to force maintenance mode on that hosts from the migration script before taking the backup as we do for the host with spm_id=1.

> If you just restore your database because of data corruption etc, but HE
> cluster is still alive, you should not remove hosts, just operate on the
> engine. 
> 
> In case of disaster recovery, when HE cluster is lost, why should you keep
> cluster members running? 

The scope is a migration from an initial storage device for the hosted-engine storage domain to a different one without the need to poweroff the whole datacenter.

In order to let the engine re-trigger the auto-import procedure for the hosted-engine storage domain we have also to filter out at restore time the previous hosted-engine storage domain and the previous engine VM since the user is explicitly denied to do that from the engine.
This has been addressed as well as for https://bugzilla.redhat.com/show_bug.cgi?id=1240466

Comment 13 Yaniv Kaul 2017-03-09 09:18:00 UTC
Is this going into 4.1.1?

Comment 15 Denis Chaplygin 2017-03-20 13:55:38 UTC
If i understand correctly, yes.

Comment 16 Tal Nisan 2017-03-27 12:33:48 UTC
(In reply to Yaniv Kaul from comment #13)
> Is this going into 4.1.1?

If this should indeed go to 4.1.1-1 it should be marked as a blocker

Comment 18 Nikolai Sednev 2017-04-05 16:05:58 UTC
https://bugzilla.redhat.com/show_bug.cgi?id=1417518#c9 issue is not fixed in this bug, see also its being reproduced here:
https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28

Returning back to assigned.

Comment 19 Denis Chaplygin 2017-04-06 07:52:36 UTC
Have you seen message '  - Please redeploy already existing HE hosts IMMEDIATELY after restore, to avoid possible SPM deadlocks.'? 

In case of positive answer, you did your next steps wrong. In https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28 you wrote: 

6)Deployment was successfully accomplished and I saw in engine that only puma19 was added to it and there also was alma03 none hosted-engine host with 6 regular VMs, then I've powered-off the HE-VM in order to finish the hosted-engine deployment.


But, as warning message told you, you are supposed to redeploy alma03 right after restore procedure.

Comment 20 Nikolai Sednev 2017-04-06 08:19:52 UTC
(In reply to Denis Chaplygin from comment #19)
> Have you seen message '  - Please redeploy already existing HE hosts
> IMMEDIATELY after restore, to avoid possible SPM deadlocks.'? 
> 
> In case of positive answer, you did your next steps wrong. In
> https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28 you wrote: 
> 
> 6)Deployment was successfully accomplished and I saw in engine that only
> puma19 was added to it and there also was alma03 none hosted-engine host
> with 6 regular VMs, then I've powered-off the HE-VM in order to finish the
> hosted-engine deployment.
> 
> 
> But, as warning message told you, you are supposed to redeploy alma03 right
> after restore procedure.

The answer appears already here https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c36, anyway copying it here:
No, I did not seen that message, where is it expected to appear?
If customer like in my scenario have 3 hosts, one regular host alma03, then two hosted-engine-hosts puma18&19 and they're all running number of VMs that only one host might be placed to redeployment at a time, to avoid VM's shutdown, then you simply can't redeploy all hosted-engine-hosts without loosing VMs.

Comment 21 Denis Chaplygin 2017-04-06 09:37:11 UTC
First of all, HE-to-HE migrration procedure should be documented, as per https://bugzilla.redhat.com/show_bug.cgi?id=1431635

Second, you should have seen that message, mentioned above, during restore procedure.

Regarding HE redeploy - i'm not sure whether we support that at all. We have HE-to-HE migration procedure, which implies deployment of new HE environment on a _new_ storage, not the old one as you did.

That procedure also _requires_ moving SPM to any non-HE host. It also requires running migration helper script, that will put host with host_id equal to one into maintenance mode. Finally, you are supposed to redeploy your old HE hosts immediately after database restoration. 

In case you fail to met conditions, listed above (as you did), we can not guarantee success of that migration. 

So, as you have only 3 hosts and all of them are HE hosts, it is not possible to redeploy hosted engine environment on them. In case you really need it, i would recommend following steps:

1)Provide new storage for the new HE environment
2)Undeploy HE from two of the hosts
3)Move SPM to any of thoses hosts
4)Migrate all your VMs to non HE hosts (except HEVM obviously)
5)execute migration helper script. It could happen, that it will put one of your non-HE hosts into maintenance mode. If you are not able to put all your VMs to the single host, you have to stop them then.
5)Deploy new HE environment
6)Restore database, add non-HE hosts
8)Deploy HE on non-HE hosts.

Comment 22 Sandro Bonazzola 2017-04-06 12:27:56 UTC
So, moving back to QE?

Comment 23 Denis Chaplygin 2017-04-06 13:00:18 UTC
We (me and Nikolai) have discussed that issues and discovered, that we can't do too much here, as we are bumping design limits.

We may need to improve HE tools to engine communications, but it is definitely deserves another BZ. I'm leaving that to PM to decide.

Comment 24 Artyom 2017-04-10 09:30:00 UTC
Verified on

1) Deploy HE environment

2) Check the database hosts SPM id's
                vds_id                |   vds_name   |            host_name            
--------------------------------------+--------------+---------------------------------
 fca7300c-760a-45aa-aa81-a8968fb8abef | host_mixed_2 | alma06.qa.lab.tlv.redhat.com - non-HE host
 2411f6f7-b0dd-4413-a1b5-cc0d039fd82b | host_mixed_1 | alma05.qa.lab.tlv.redhat.com - HE host
 2c7be430-a43f-4a87-a393-601f999fd3e6 | host_mixed_3 | cyan-vdsg.qa.lab.tlv.redhat.com - non-HE host
(3 rows)

engine=# select * from vds_spm_id_map;
           storage_pool_id            | vds_spm_id |                vds_id                
--------------------------------------+------------+--------------------------------------
 00000001-0001-0001-0001-000000000311 |          1 | 2c7be430-a43f-4a87-a393-601f999fd3e6
 00000001-0001-0001-0001-000000000311 |          2 | 2411f6f7-b0dd-4413-a1b5-cc0d039fd82b
 00000001-0001-0001-0001-000000000311 |          3 | fca7300c-760a-45aa-aa81-a8968fb8abef

3) Run engine-migrate script
# python /usr/share/ovirt-engine/bin/engine-migrate-he.py
Engine REST API url[https://compute-ge-he-2.qa.lab.tlv.redhat.com/ovirt-engine/api]:
Engine REST API username[admin@internal]:
Engine REST API password:123456
Engine CA certificate file[/etc/pki/ovirt-engine/ca.pem]:
         0) Default
         1) golden_env_mixed
Select which Data Center will run Hosted Engine:1
Putting host host_mixed_3 to maintenance
Waiting for host to switch into Maintenance state
Waiting for host to switch into Maintenance state
Waiting for host to switch into Maintenance state
Host is in Maintenance state

4) Select as SPM the host host_mixed_2(non-HE host)

5) Backup the engine: # engine-backup --mode=backup --file=engine.backup --log=engine-backup.log

6) Copy the backup file from the HE VM to the host

7) Clean HE host from HE deploy(reprovisioning)

8) Run the HE deployment again

9) Answer No on the question "Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? "

10) Enter to the HE VM and copy the backup file from the host to the HE VM

11) Restore the engine: # engine-backup --mode=restore --scope=all --file=engine.backup --log=engine-restore.log  --he-remove-storage-vm --he-remove-hosts --restore-permissions --provision-dwh-db --provision-db

Objects that were added, removed or changed after this date, such as virtual
machines, disks, etc., are missing in the engine, and will probably require
recovery or recreation.
------------------------------------------------------------------------------
  - Removing the hosted-engine storage domain, all its entities and the hosted-engine VM.
  - Removing all the hosted-engine hosts.
  - Please redeploy already existing HE hosts IMMEDIATELY after restore, to avoid possible SPM deadlocks.
- DWH database 'ovirt_engine_history'
You should now run engine-setup.
Done.

12) After deployment finished
engine=# select vds_id, vds_name, host_name from vds_static;
                vds_id                |           vds_name           |            host_name            
--------------------------------------+------------------------------+---------------------------------
 fca7300c-760a-45aa-aa81-a8968fb8abef | host_mixed_2                 | alma06.qa.lab.tlv.redhat.com
 2c7be430-a43f-4a87-a393-601f999fd3e6 | host_mixed_3                 | cyan-vdsg.qa.lab.tlv.redhat.com
 19d88d3d-7d9f-4d19-8372-b8f1086f309e | alma05.qa.lab.tlv.redhat.com | alma05.qa.lab.tlv.redhat.com
(3 rows)

engine=# select * from vds_spm_id_map;
           storage_pool_id            | vds_spm_id |                vds_id                
--------------------------------------+------------+--------------------------------------
 00000001-0001-0001-0001-000000000311 |          3 | fca7300c-760a-45aa-aa81-a8968fb8abef
 00000001-0001-0001-0001-000000000311 |          4 | 2c7be430-a43f-4a87-a393-601f999fd3e6
 00000001-0001-0001-0001-000000000311 |          1 | 19d88d3d-7d9f-4d19-8372-b8f1086f309e

13) HE VM and HE SD both have active state


Note You need to log in before you can comment on or make changes to this bug.