Description of problem: Auto import hosted engine domain fails after engine DB restored on it from bare-bone engine deployment. I had bare-bone regular engine installation with one host and 10 guest VMs, ISO domain, export domain and one NFS data SD. The 3.6 engine was cleanly installed on el7.2 host named alma03. I've followed the http://brq-setup.rhev.lab.eng.brq.redhat.com/ovirt-engine/docs/manual/en_US/html/Self-Hosted_Engine_Guide/chap-Migrating_from_Bare_Metal_to_a_RHEL-Based_Self-Hosted_Environment.html and http://brq-setup.rhev.lab.eng.brq.redhat.com/ovirt-engine/docs/manual/en_US/html/Self-Hosted_Engine_Guide/Restoring_the_Self-Hosted_Engine_Manager.html in order to get engine DB backup and then restored it on HE-VM during it's deployment on host named seal10 over NFS SD named "nsednev_3_6_he_backedup_from_alma_03". Deployment finished successfully and I also added additional host named alma04 as second HE-host. I see all previously available guest-VMs, ISO domain and data SD, but auto-import fails to import the HE-SD in to the engine. I've tried to destroy the "hosted_storage" in "Storage" tab, but it did not helped, the "hosted_storage" returns to the same state at least for 3 times I've tried this. My DC and host cluster are in 3.6 compatibility mode. Version-Release number of selected component (if applicable): Hosts: vdsm-4.17.23-0.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 libvirt-client-1.2.17-13.el7_2.4.x86_64 ovirt-hosted-engine-setup-1.3.3.4-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch mom-0.5.2-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.4.3-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch Linux version 3.10.0-327.13.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-4) (GCC) ) #1 SMP Mon Feb 29 13:22:02 EST 2016 Engine: rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-branding-rhev-3.6.0-8.el6ev.noarch rhevm-sdk-python-3.6.3.0-1.el6ev.noarch rhevm-reports-3.6.3-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.3.4-0.1.el6.noarch rhevm-dbscripts-3.6.3.4-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-6.el6.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.3.4-0.1.el6.noarch rhevm-backend-3.6.3.4-0.1.el6.noarch rhevm-spice-client-x86-msi-3.6-6.el6.noarch rhevm-guest-agent-common-1.0.11-2.el6ev.noarch rhevm-setup-base-3.6.3.4-0.1.el6.noarch rhevm-extensions-api-impl-3.6.3.4-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.3.4-0.1.el6.noarch rhevm-restapi-3.6.3.4-0.1.el6.noarch rhevm-doc-3.6.0-4.el6eng.noarch rhevm-spice-client-x64-cab-3.6-6.el6.noarch rhevm-setup-plugins-3.6.3-1.el6ev.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-dwh-3.6.2-1.el6ev.noarch rhevm-cli-3.6.2.0-1.el6ev.noarch rhevm-lib-3.6.3.4-0.1.el6.noarch rhevm-websocket-proxy-3.6.3.4-0.1.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.3.4-0.1.el6.noarch rhevm-userportal-3.6.3.4-0.1.el6.noarch rhevm-3.6.3.4-0.1.el6.noarch rhevm-spice-client-x64-msi-3.6-6.el6.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-dwh-setup-3.6.2-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.3.4-0.1.el6.noarch rhevm-tools-3.6.3.4-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-reports-setup-3.6.3-1.el6ev.noarch rhevm-setup-3.6.3.4-0.1.el6.noarch rhevm-webadmin-portal-3.6.3.4-0.1.el6.noarch How reproducible: Steps to Reproduce: 1.Deploy 3.6 engine on bare-bone el7.2 host, FQDN of the engine=FQDN of the host with DWH&reports&serial-console. 2.Add ISO domain and export domain and NFS data SD. 3.Add host on which you will be able to run some guest VMs. 4.Create 5-el6.6 and 5-el7.2 guest-VMs and start them. 5.Follow engine backup from http://brq-setup.rhev.lab.eng.brq.redhat.com/ovirt-engine/docs/manual/en_US/html/Self-Hosted_Engine_Guide/chap-Migrating_from_Bare_Metal_to_a_RHEL-Based_Self-Hosted_Environment.html. 6.On some additional el7.2 host, start deployment of HE, while using appliance with cloud-init. 7.During HE-deployment answer "No" for this question "Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? ". 8.Follow the instructions and restore engine's DB from backed up files from your bare-bone engine. 9.Finish the HE-deployment. 10.Add additional HE-host to your environment to meet the minimum HA requirements for your HE-VM. Actual results: HE storage domain not auto-imported in to the engine's WEBUI. Expected results: HE hosted_storage should be auto-imported in to the HE-WEBUI. Additional info: sosreports from both hosts and HE are attached.
Created attachment 1137330 [details] engine's sosreport
Attaching sosreports from hosts as external sources: sosreport from additional hosted-engine host (alma04): https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88ejhfZ0YyWWVkMG8/view?usp=sharing sosreport from first hosted-engine host (seal10): https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88X1UyeWpfSF8tc3M/view?usp=sharing
Lowering the severity as HE actually running and functions properly, except for not being auto-imported in to the WEBUI.
Looking in to the https://bugzilla.redhat.com/show_bug.cgi?id=1269768 and checking the OVF_STORE current location in my environment, I see that it's located in nsednev_3_6_p2v_he_1, which is my Data SD for regular guest-vms. Shouldn't OVF_STORE be located in HE-SD which is nsednev_3_6_he_backedup_from_alma_03?
Created attachment 1137411 [details] Screenshot from 2016-03-17 16:53:40.png
(In reply to Nikolai Sednev from comment #4) > Looking in to the https://bugzilla.redhat.com/show_bug.cgi?id=1269768 and > checking the OVF_STORE current location in my environment, I see that it's > located in nsednev_3_6_p2v_he_1, which is my Data SD for regular guest-vms. > Shouldn't OVF_STORE be located in HE-SD which is > nsednev_3_6_he_backedup_from_alma_03? An OVF_STORE (2 actually) will be created by the engine for each domain with vm disk *if* the engine knows about it. Since the engine did not import the HE VM yet then those special disks aren't created yet.
Now spent some time looking at various attached logs, per Roy's request. The flow seems to have been: 1. alma04 was a host managed by the engine prior to the migration 2. At some point it was removed from the engine. 3. Then, as described above, engine was migrated from (old, physical host) alma03 to a VM on hosted-engine host seal10 4. Also as described above, alma04 was added as an additional hosted-engine host. 5. It seems to me that the engine decided to use alma04 to import the hosted storage, tried to get a sanlock lock, and failed. agent.log on alma04 has: MainThread::INFO::2016-03-16 17:57:27,877::hosted_engine::757::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_sanlock) Acquired lock on host id 2 but later: MainThread::ERROR::2016-03-16 18:01:36,566::hosted_engine::845::ovirt_hosted_engine_ha.agent.hosted_engine.HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain (sd_uuid=5e3a8253-7dd5-48a4-9070-9edb741b4383, host_id=2): timeout during domain acquisition and similar. sanlock.log on alma04 has: 2016-03-16 16:57:09+0200 184681 [121112]: s3 delta_acquire host_id 1 busy1 1 2 14210 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. 2016-03-16 16:57:10+0200 184682 [58002]: s3 add_lockspace fail result -262 2016-03-16 17:37:33+0200 187106 [57992]: s4 host 1 2 187084 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. 2016-03-16 17:37:33+0200 187106 [57992]: s4 host 2 2 1794 c6604252-9fe5-47cd-99e0-ec8016d20abd.seal10.qa. 2016-03-16 17:37:33+0200 187106 [57992]: s4 host 250 1 0 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. sanlock.log on seal10 has no 'fail', but does have: 2016-03-16 14:56:52+0200 7001 [7665]: s3:r3 resource dc4a1da7-e8ad-4ebf-bcb9-5c4342c62f52:SDM:/rhev/data-center/mnt/_var_lib_ovirt-hosted-engine-setup_tmpktBTvH/dc4a1da7-e8ad-4ebf-bcb9-5c4342c62f52/dom_md/leases:1048576 for 3,11,7081 2016-03-16 14:56:52+0200 7001 [6166]: s4 host 1 1 6980 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. 2016-03-16 14:56:52+0200 7001 [6166]: s4 host 250 1 0 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. 2016-03-16 14:56:52+0200 7001 [6166]: s3 host 1 1 6980 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. 2016-03-16 14:56:52+0200 7001 [6166]: s3 host 250 1 0 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. and later: 2016-03-16 16:55:09+0200 14098 [7661]: add_lockspace 5e3a8253-7dd5-48a4-9070-9edb741b4383:2:/rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__he__backedup__from__alma__03/5e3a8253-7dd5-48a4-9070-9edb741b4383/dom_md/ids:0 conflicts with name of list1 s5 5e3a8253-7dd5-48a4-9070-9edb741b4383:1:/rhev/data-center/mnt/10.35.64.11:_vol_RHEV_Virt_nsednev__3__6__he__backedup__from__alma__03/5e3a8253-7dd5-48a4-9070-9edb741b4383/dom_md/ids:0 2016-03-16 16:55:10+0200 14099 [6166]: s6 host 1 1 184548 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. 2016-03-16 16:55:10+0200 14099 [6166]: s6 host 2 1 14078 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. 2016-03-16 16:55:10+0200 14099 [6166]: s6 host 250 1 0 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. and many more 'conflicts'. I can't properly read sanlock.log files, but it does not seem ok to me. I suggest to ask some storage people about this.
Didi thanks. alma04 is RHEL 7.2 Beta. This shouldn't be supported. Please upgrade it and retry.
(In reply to Roy Golan from comment #8) > Didi thanks. > > alma04 is RHEL 7.2 Beta. This shouldn't be supported. Please upgrade it and > retry. Not helped. [root@alma04 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@alma04 ~]# uname -a Linux alma04.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux [root@alma04 ~]# rpm -qa libvirt-client sanlock qemu-kvm-rhev vdsm mom ovirt* ovirt-vmconsole-1.0.0-1.el7ev.noarch ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 mom-0.5.2-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 vdsm-4.17.23.1-0.el7ev.noarch ovirt-hosted-engine-ha-1.3.5-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 ovirt-vmconsole-host-1.0.0-1.el7ev.noarch [root@seal10 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.2 (Maipo) [root@seal10 ~]# uname -a Linux seal10.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux [root@seal10 ~]# rpm -qa libvirt-client sanlock qemu-kvm-rhev vdsm mom ovirt* qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch libvirt-client-1.2.17-13.el7_2.4.x86_64 sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch vdsm-4.17.23.1-0.el7ev.noarch mom-0.5.2-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch [root@alma04 ~]# hosted-engine --vm-status --== Host 1 status ==-- Status up-to-date : True Hostname : seal10.qa.lab.tlv.redhat.com Host ID : 1 Engine status : {"health": "good", "vm": "up", "detail": "up"} Score : 3400 stopped : False Local maintenance : False crc32 : 924f2fc8 Host timestamp : 51040 --== Host 2 status ==-- Status up-to-date : True Hostname : alma04.qa.lab.tlv.redhat.com Host ID : 2 Engine status : {"reason": "vm not running on this host", "health": "bad", "vm": "down", "detail": "unknown"} Score : 3400 stopped : False Local maintenance : False crc32 : 15b51483 Host timestamp : 51166 Taken from WEBUI Events: Mar 21, 2016 8:00:40 AM VDSM command failed: Cannot acquire host id: (u'5e3a8253-7dd5-48a4-9070-9edb741b4383', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) Mar 21, 2016 8:00:39 AM Storage Pool Manager runs on Host hosted_engine_1 (Address: seal10.qa.lab.tlv.redhat.com). Mar 21, 2016 8:00:32 AM Data Center is being initialized, please wait for initialization to complete. Mar 21, 2016 8:00:30 AM VDSM command failed: Cannot acquire host id: (u'5e3a8253-7dd5-48a4-9070-9edb741b4383', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) Mar 21, 2016 8:00:28 AM Storage Domain hosted_storage was added by SYSTEM Mar 21, 2016 8:00:18 AM Storage Domain hosted_storage was forcibly removed by admin@internal See comment 30 from https://bugzilla.redhat.com/show_bug.cgi?id=1269768 "Dec 16, 2015 12:16:57 PM VDSM hosted_engine_1 command failed: Cannot acquire host id: (u'97f5a165-4df5-4bce-99cc-8a634753bc54', SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument'))" Looks pretty the same. Also might be related to https://bugzilla.redhat.com/show_bug.cgi?id=1305768
Attaching more logs from current state: https://drive.google.com/a/redhat.com/file/d/0B85BEaDBcF88aWJxWDEwcDU2c1U/view?usp=sharing
(In reply to Yedidyah Bar David from comment #7) > Now spent some time looking at various attached logs, per Roy's request. > > The flow seems to have been: > > 1. alma04 was a host managed by the engine prior to the migration > 2. At some point it was removed from the engine. > 3. Then, as described above, engine was migrated from (old, physical host) > alma03 to a VM on hosted-engine host seal10 > 4. Also as described above, alma04 was added as an additional hosted-engine > host. > 5. It seems to me that the engine decided to use alma04 to import the hosted > storage, tried to get a sanlock lock, and failed. > > agent.log on alma04 has: > > MainThread::INFO::2016-03-16 > 17:57:27,877::hosted_engine::757::ovirt_hosted_engine_ha.agent.hosted_engine. > HostedEngine::(_initialize_sanlock) Acquired lock on host id 2 > > but later: > > MainThread::ERROR::2016-03-16 > 18:01:36,566::hosted_engine::845::ovirt_hosted_engine_ha.agent.hosted_engine. > HostedEngine::(_initialize_domain_monitor) Failed to start monitoring domain > (sd_uuid=5e3a8253-7dd5-48a4-9070-9edb741b4383, host_id=2): timeout during > domain acquisition > > and similar. > > sanlock.log on alma04 has: > > 2016-03-16 16:57:09+0200 184681 [121112]: s3 delta_acquire host_id 1 busy1 1 > 2 14210 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. > 2016-03-16 16:57:10+0200 184682 [58002]: s3 add_lockspace fail result -262 > > 2016-03-16 17:37:33+0200 187106 [57992]: s4 host 1 2 187084 > 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. > 2016-03-16 17:37:33+0200 187106 [57992]: s4 host 2 2 1794 > c6604252-9fe5-47cd-99e0-ec8016d20abd.seal10.qa. > 2016-03-16 17:37:33+0200 187106 [57992]: s4 host 250 1 0 > 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. > > sanlock.log on seal10 has no 'fail', but does have: > > 2016-03-16 14:56:52+0200 7001 [7665]: s3:r3 resource > dc4a1da7-e8ad-4ebf-bcb9-5c4342c62f52:SDM:/rhev/data-center/mnt/ > _var_lib_ovirt-hosted-engine-setup_tmpktBTvH/dc4a1da7-e8ad-4ebf-bcb9- > 5c4342c62f52/dom_md/leases:1048576 for 3,11,7081 > 2016-03-16 14:56:52+0200 7001 [6166]: s4 host 1 1 6980 > 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. > 2016-03-16 14:56:52+0200 7001 [6166]: s4 host 250 1 0 > 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. > 2016-03-16 14:56:52+0200 7001 [6166]: s3 host 1 1 6980 > 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. > 2016-03-16 14:56:52+0200 7001 [6166]: s3 host 250 1 0 > 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. > > and later: > > 2016-03-16 16:55:09+0200 14098 [7661]: add_lockspace > 5e3a8253-7dd5-48a4-9070-9edb741b4383:2:/rhev/data-center/mnt/10.35.64.11: > _vol_RHEV_Virt_nsednev__3__6__he__backedup__from__alma__03/5e3a8253-7dd5- > 48a4-9070-9edb741b4383/dom_md/ids:0 conflicts with name of list1 s5 > 5e3a8253-7dd5-48a4-9070-9edb741b4383:1:/rhev/data-center/mnt/10.35.64.11: > _vol_RHEV_Virt_nsednev__3__6__he__backedup__from__alma__03/5e3a8253-7dd5- > 48a4-9070-9edb741b4383/dom_md/ids:0 > 2016-03-16 16:55:10+0200 14099 [6166]: s6 host 1 1 184548 > 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. > 2016-03-16 16:55:10+0200 14099 [6166]: s6 host 2 1 14078 > 53300334-5c06-4f58-a562-d7d02afb67e2.seal10.qa. > 2016-03-16 16:55:10+0200 14099 [6166]: s6 host 250 1 0 > 0525e867-a2b3-4a55-83d2-07838a5a06af.alma04.qa. > > and many more 'conflicts'. I can't properly read sanlock.log files, but it > does not seem ok to me. > > I suggest to ask some storage people about this. Not exactly: 1. alma04 was a host managed by the engine prior to the migration<-Yes, it was and with 10 guest VMs running on top of it. 2. At some point it was removed from the engine. <-I did not removed alma04 from the engine, it remained connected to the engine within it's DB, during bare-metal->HE migration of the engine and then was reconnected to the HE from the DB restore. 3. Then, as described above, engine was migrated from (old, physical host) alma03 to a VM on hosted-engine host seal10 <-Exactly, and migration succeeded with full DB restore. 4. Also as described above, alma04 was added as an additional hosted-engine host. <-All guest VMs were migrated from alma04 to seal10, then ovirt-hosted-engine-setup was installed on alma04 and then alma04 was added as additional HE-host to seal10. 5. It seems to me that the engine decided to use alma04 to import the hosted storage, tried to get a sanlock lock, and failed. <-Exactly.
(In reply to Nikolai Sednev from comment #11) > (In reply to Yedidyah Bar David from comment #7) > > Not exactly: > > 1. alma04 was a host managed by the engine prior to the migration > > Yes, it was and with 10 guest VMs running on top of it. > > > 2. At some point it was removed from the engine. > > I did not removed alma04 from the engine, it remained connected to the engine within it's DB, during > bare-metal->HE migration of the engine and then was reconnected to the HE > from the DB restore. > > > 3. Then, as described above, engine was migrated from (old, physical host) > > alma03 to a VM on hosted-engine host seal10 > > Exactly, and migration succeeded with full DB restore. > > > 4. Also as described above, alma04 was added as an additional hosted-engine > > host. > > All guest VMs were migrated from alma04 to seal10, then > ovirt-hosted-engine-setup was installed on alma04 and then alma04 was added > as additional HE-host to seal10. > > > 5. It seems to me that the engine decided to use alma04 to import the hosted > > storage, tried to get a sanlock lock, and failed. > > Exactly. (Please reply inline like above, using copy/paste and '<-' is much harder to read and not less work for you) Bottom line - your step (10.) from comment 0 was done on an existing host, not a new one. Not sure we really need to support this flow. Please try again, but before step 10, reinstall the host from scratch (after moving to maint and removing from engine). For now, hosted-engine --deploy should only be run on new hosts. In some cases it will work on existing ones, but success is not guaranteed. It's quite likely that we'll not officially support this without solving bug 1001181 and use the tool created there.
Nikolai, I have one question. Did you check that no hosted engine agent and no vdsm is running when you re-added the host to the new setup? Because what might have happened is that the "new" VDSM setup tried to acquire a new ID using the same old lockspace. That indeed results in the SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) error.
(In reply to Martin Sivák from comment #16) > Nikolai, I have one question. Did you check that no hosted engine agent and > no vdsm is running when you re-added the host to the new setup? > > Because what might have happened is that the "new" VDSM setup tried to > acquire a new ID using the same old lockspace. That indeed results in the > SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) > error. yes, there was no he-agent or broker runnig on alma04.
(In reply to Martin Sivák from comment #16) > Nikolai, I have one question. Did you check that no hosted engine agent and > no vdsm is running when you re-added the host to the new setup? > > Because what might have happened is that the "new" VDSM setup tried to > acquire a new ID using the same old lockspace. That indeed results in the > SanlockException(22, 'Sanlock lockspace add failure', 'Invalid argument')) > error. But yes, there was vdsm running as alma04 was the hipervisor for the 10 guest VMs.
1)I've cleanly reprovisioned both hosts, one at a time, and redeployed HE on each, while another host was running engine and 10 guest-VMs. 2)I destroyed the hosted_storage, so auto-import could be started again. 3)The redeployments succeeded and hosted_storage was auto-imported successfully. Works for me with these components: Hosts: libvirt-client-1.2.17-13.el7_2.4.x86_64 ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 mom-0.5.2-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch vdsm-4.17.23.1-0.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch Linux seal10.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.2 (Maipo) Engine: rhevm-setup-plugin-ovirt-engine-common-3.6.4-0.1.el6.noarch rhevm-branding-rhev-3.6.0-9.el6ev.noarch rhevm-webadmin-portal-3.6.4-0.1.el6.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-sdk-python-3.6.3.0-1.el6ev.noarch rhevm-reports-3.6.3-1.el6ev.noarch rhevm-vmconsole-proxy-helper-3.6.4-0.1.el6.noarch rhevm-dbscripts-3.6.4-0.1.el6.noarch rhevm-spice-client-x86-cab-3.6-6.el6.noarch rhevm-lib-3.6.4-0.1.el6.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.4-0.1.el6.noarch rhevm-setup-3.6.4-0.1.el6.noarch rhevm-restapi-3.6.4-0.1.el6.noarch rhevm-tools-3.6.4-0.1.el6.noarch rhevm-spice-client-x86-msi-3.6-6.el6.noarch rhevm-guest-agent-common-1.0.11-2.el6ev.noarch rhevm-setup-base-3.6.4-0.1.el6.noarch rhevm-setup-plugin-websocket-proxy-3.6.4-0.1.el6.noarch rhevm-extensions-api-impl-3.6.4-0.1.el6.noarch rhevm-userportal-3.6.4-0.1.el6.noarch rhevm-3.6.4-0.1.el6.noarch rhevm-doc-3.6.0-4.el6eng.noarch rhevm-spice-client-x64-cab-3.6-6.el6.noarch rhevm-setup-plugins-3.6.3-1.el6ev.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-dwh-3.6.2-1.el6ev.noarch rhevm-cli-3.6.2.0-1.el6ev.noarch rhevm-websocket-proxy-3.6.4-0.1.el6.noarch rhevm-spice-client-x64-msi-3.6-6.el6.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-dwh-setup-3.6.2-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.4-0.1.el6.noarch rhevm-backend-3.6.4-0.1.el6.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-reports-setup-3.6.3-1.el6ev.noarch Presence of VDSM/libvirt/sanlock/qemu-kvm-rhev/etc, on host that was previously non-hosted-engine-host, that was running guest-VMs, might have been a problem to hosted-engine normal deployment on top of it, I've took the suggestion of Martin from comment #16 and it solved this issue. Please consider closing this bug as works for me and I'm adding + to doc, so this could be documented properly.
(In reply to Nikolai Sednev from comment #19) > Please consider closing this bug as works for me and I'm adding + to doc, so > this could be documented properly. Not sure what you mean here exactly. Bottom line: Users that want to add an existing host to their hosted-engine cluster by running on it hosted-engine --deploy, have to reinstall the OS on it to make sure it's clean. Luci - how should we continue? Perhaps add this to the main docs somewhere? Write a KB?
(In reply to Yedidyah Bar David from comment #20) > (In reply to Nikolai Sednev from comment #19) > > Please consider closing this bug as works for me and I'm adding + to doc, so > > this could be documented properly. > > Not sure what you mean here exactly. > > Bottom line: Users that want to add an existing host to their hosted-engine > cluster by running on it hosted-engine --deploy, have to reinstall the OS on > it to make sure it's clean. > > Luci - how should we continue? Perhaps add this to the main docs somewhere? > Write a KB? I've meant that: -----------------------------Bare-metal-setup-------------------------- 1-engine installed on host1 as bare metal deployment. 2-host2 being used as hypervisor for guest-VMs. 3-backup engine's DB. ----------------------------------------------------------------------- | V ----------------------------Bare-metal-to-HE-setup--------------------- 1-engine becomes HE with restored DB, running on top of host1 or other host-x. 2-all guest-VMs migrated from regular non-he-host2 to HE-host1. 3-reprovision host2 to get clean host and install on it ovirt-hosted-engine-setup. 4-deploy he on clean host2 and add it as additional host.
Putting this on_qa, @Nikolai please make sure the doc is added. Thanks!
Hi, it might not be necessary to fully reinstall the host. Only the VDSM configuration has to be wiped out to make sure VDSM does not use any lockspace and won't try connecting to one after reboot. Nir? Is there a procedure to accomplish that?
(In reply to Roy Golan from comment #22) > Putting this on_qa, @Nikolai please make sure the doc is added. Thanks! I've successfully redeployed both hosts each at a time and had no troubles adding them to HE environment. Both hosts were cleanly reprovisioned each at a time, so no previously existing information about HE environment could be on them. Hosts: ovirt-hosted-engine-setup-1.3.4.0-1.el7ev.noarch sanlock-3.2.4-2.el7_2.x86_64 ovirt-vmconsole-host-1.0.0-1.el7ev.noarch ovirt-host-deploy-1.4.1-1.el7ev.noarch qemu-kvm-rhev-2.3.0-31.el7_2.10.x86_64 mom-0.5.2-1.el7ev.noarch ovirt-setup-lib-1.0.1-1.el7ev.noarch ovirt-hosted-engine-ha-1.3.5.1-1.el7ev.noarch vdsm-4.17.23.2-1.el7ev.noarch ovirt-vmconsole-1.0.0-1.el7ev.noarch Red Hat Enterprise Linux Server release 7.2 (Maipo) Linux seal10.qa.lab.tlv.redhat.com 3.10.0-327.13.1.el7.x86_64 #1 SMP Mon Feb 29 13:22:02 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Engine: rhevm-branding-rhev-3.6.0-9.el6ev.noarch rhevm-dependencies-3.6.0-1.el6ev.noarch rhevm-setup-plugin-vmconsole-proxy-helper-3.6.4.1-0.1.el6.noarch rhevm-sdk-python-3.6.3.0-1.el6ev.noarch rhevm-tools-3.6.4.1-0.1.el6.noarch rhevm-reports-3.6.3-1.el6ev.noarch rhevm-spice-client-x86-cab-3.6-6.el6.noarch rhevm-setup-base-3.6.4.1-0.1.el6.noarch rhevm-extensions-api-impl-3.6.4.1-0.1.el6.noarch rhevm-spice-client-x86-msi-3.6-6.el6.noarch rhevm-setup-plugin-ovirt-engine-common-3.6.4.1-0.1.el6.noarch rhevm-websocket-proxy-3.6.4.1-0.1.el6.noarch rhevm-backend-3.6.4.1-0.1.el6.noarch rhevm-guest-agent-common-1.0.11-2.el6ev.noarch rhevm-userportal-3.6.4.1-0.1.el6.noarch rhevm-doc-3.6.0-4.el6eng.noarch rhevm-spice-client-x64-cab-3.6-6.el6.noarch rhevm-setup-plugins-3.6.3-1.el6ev.noarch rhevm-setup-plugin-ovirt-engine-3.6.4.1-0.1.el6.noarch rhevm-vmconsole-proxy-helper-3.6.4.1-0.1.el6.noarch rhevm-dbscripts-3.6.4.1-0.1.el6.noarch rhevm-3.6.4.1-0.1.el6.noarch rhevm-iso-uploader-3.6.0-1.el6ev.noarch rhevm-dwh-3.6.2-1.el6ev.noarch rhevm-cli-3.6.2.0-1.el6ev.noarch rhevm-spice-client-x64-msi-3.6-6.el6.noarch rhevm-lib-3.6.4.1-0.1.el6.noarch rhevm-setup-3.6.4.1-0.1.el6.noarch rhevm-webadmin-portal-3.6.4.1-0.1.el6.noarch rhevm-image-uploader-3.6.0-1.el6ev.noarch rhevm-dwh-setup-3.6.2-1.el6ev.noarch rhevm-log-collector-3.6.1-1.el6ev.noarch rhevm-setup-plugin-websocket-proxy-3.6.4.1-0.1.el6.noarch rhevm-restapi-3.6.4.1-0.1.el6.noarch rhevm-reports-setup-3.6.3-1.el6ev.noarch Red Hat Enterprise Linux Server release 6.7 (Santiago) Linux alma03.qa.lab.tlv.redhat.com 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64 x86_64 x86_64 GNU/Linux Luci - Still need your reply to comment #20.
(In reply to Yedidyah Bar David from comment #20) > (In reply to Nikolai Sednev from comment #19) > > Please consider closing this bug as works for me and I'm adding + to doc, so > > this could be documented properly. > > Not sure what you mean here exactly. > > Bottom line: Users that want to add an existing host to their hosted-engine > cluster by running on it hosted-engine --deploy, have to reinstall the OS on > it to make sure it's clean. > > Luci - how should we continue? Perhaps add this to the main docs somewhere? > Write a KB? Apologies for the delay, and thanks, Nikolai, for the reminder. If I understand correctly, the documentation requirement here is to make clear that hosts that existed in the original environment must be reinstalled before they are used in the new self-hosted engine setup? If this is something that must be done for every migration, we should add it as a step in the documented procedure (https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.6/html-single/Self-Hosted_Engine_Guide/index.html#chap-Migrating_from_Bare_Metal_to_a_RHEL-Based_Self-Hosted_Environment). Can you (Didi or Nikolai) advise where this step should be added?
Reinstating needinfo on nsoffer, which was cleared by mistake.
(In reply to Lucy Bopf from comment #25) > (In reply to Yedidyah Bar David from comment #20) > > (In reply to Nikolai Sednev from comment #19) > > > Please consider closing this bug as works for me and I'm adding + to doc, so > > > this could be documented properly. > > > > Not sure what you mean here exactly. > > > > Bottom line: Users that want to add an existing host to their hosted-engine > > cluster by running on it hosted-engine --deploy, have to reinstall the OS on > > it to make sure it's clean. > > > > Luci - how should we continue? Perhaps add this to the main docs somewhere? > > Write a KB? > > Apologies for the delay, and thanks, Nikolai, for the reminder. > > If I understand correctly, the documentation requirement here is to make > clear that hosts that existed in the original environment must be > reinstalled before they are used in the new self-hosted engine setup? If > this is something that must be done for every migration, we should add it as > a step in the documented procedure > (https://access.redhat.com/documentation/en-US/ > Red_Hat_Enterprise_Virtualization/3.6/html-single/Self-Hosted_Engine_Guide/ > index.html#chap-Migrating_from_Bare_Metal_to_a_RHEL-Based_Self- > Hosted_Environment). Can you (Didi or Nikolai) advise where this step should > be added? Any additional host that being added should be a clean host. Hosts that were used as non-hosted-engine-hosts, like for only hosting the guest-VMs, and they have VDSM and other components, should be reprovisioned before being added as hosted-engine-hosts (additional hosts) to the HE environment. Clean reprovisioning and redeploying on previously existing hosted-engine-hosts is not required. This means that if there was HE-VM running on some hosted-engine-host, that host is not required to be reprovisioned and redeployed.
Removing need info, I don't see anything needed now.