The procedure [1] for backup and restore of hosted-engine says also: remove the hosts used for Hosted Engine from the engine This might not always be easy, depending on the state of said hosts during the time of backup. In severe cases this requires direct manipulation of the database, as there is currently no option to force a removal from the web interface. One way to make it simpler is to add an option to engine-backup --mode=restore to do that inside the database. [1] http://www.ovirt.org/OVirt_Hosted_Engine_Backup_and_Restore
Another option is to change the definition of bug 1065350 - add there another option "Remove this host from the engine".
Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.
*** Bug 1241811 has been marked as a duplicate of this bug. ***
The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.
Please provide reproduction steps for the bug.
Scenario setup: - Deploy hosted-engine on a couple of hosts; - also add an host not involved in hosted-engine - add a regular storage domain, - add a couple of VMs - take a backup of the engine with engine-backup Try the recovery over two different hosts and also over the same hosts: - start hosted-engine-setup on the first host - point to the same storage - respond no to: 'Automatically execute engine-setup on the engine appliance on first boot' - copy the backup of the engine DB to the engine VM - connect to the engine VM and execute engine-backup to restore the backup appending --he-remove-hosts option - execute engine-setup - come back to hosted-engine-setup and terminate the deployment At the end only the host where you run hosted-engine --deploy should be there as an hosted-engine host; other hosts (non involved in HE) should be there as well
[root@nsednev-he-1 ~]# engine-backup --mode=restore --log=/root/Log_nsednev_from_alma04_rhevm_4_1_1 --file=/root/nsednev_from_alma04_rhevm_4_1_1 --provision-db --provision-dwh-db --provision-reports-db --restore-permissions --he-remove-hosts --he-remove-storage-vm Preparing to restore: - Unpacking file '/root/nsednev_from_alma04_rhevm_4_1_1' Restoring: - Files Provisioning PostgreSQL users/databases: - user 'engine', database 'engine' - user 'ovirt_engine_history', database 'ovirt_engine_history' Restoring: - Engine database 'engine' - Cleaning up temporary tables in engine database 'engine' - Updating DbJustRestored VdcOption in engine database - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database ------------------------------------------------------------------------------ Please note: The engine database was backed up at 2017-03-06 18:30:38.000000000 +0200 . Objects that were added, removed or changed after this date, such as virtual machines, disks, etc., are missing in the engine, and will probably require recovery or recreation. ------------------------------------------------------------------------------ - Removing the hosted-engine storage domain, all its entities and the hosted-engine VM. FATAL: Failed cleaning hosted-engine
Probably an old appliance, although it is latest from our repos. Components on host: rhvm-appliance-4.1.20170221.0-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64 ovirt-imageio-daemon-1.0.0-0.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch mom-0.5.9-1.el7ev.noarch vdsm-4.19.7-1.el7ev.x86_64 ovirt-hosted-engine-ha-2.1.0.4-1.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 ovirt-vmconsole-1.0.4-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.0.4-1.el7ev.noarch ovirt-host-deploy-1.6.2-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Engine: rhevm-doc-4.1.0-2.el7ev.noarch rhev-guest-tools-iso-4.1-4.el7ev.noarch rhevm-dependencies-4.1.0-1.el7ev.noarch rhevm-branding-rhev-4.1.0-1.el7ev.noarch rhevm-setup-plugins-4.1.0-1.el7ev.noarch rhevm-4.1.1.2-0.1.el7.noarch Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) CREATE FUNCTION ********* QUERY ********** SELECT DeleteHostedEngineStorageVM(); ************************** SELECT DeleteHostedEngineStorageVM(); ERROR: The hosted-engine storage domain contains more than one vm. FATAL: Cannot execute sql command: --command=SELECT DeleteHostedEngineStorageVM(); 2017-03-06 18:59:34 9581: FATAL: Failed cleaning hosted-engine
Backup succeeded with rhevm-setup-plugins.noarch 4.1.1-1.el7ev on engine: rhevm-doc-4.1.0-2.el7ev.noarch rhev-guest-tools-iso-4.1-4.el7ev.noarch rhevm-branding-rhev-4.1.0-1.el7ev.noarch rhevm-4.1.1.3-0.1.el7.noarch rhevm-setup-plugins-4.1.1-1.el7ev.noarch rhevm-dependencies-4.1.1-1.el7ev.noarch Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016 Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) I had to update the appliance from within to latest components, so backup could get finished. After running engine-setup, I've continued with hosted-engine deployment on host and got: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs. [ ERROR ] Unable to add alma04.qa.lab.tlv.redhat.com to the manager [ INFO ] Waiting for VDSM to reply [ INFO ] Waiting for VDSM to reply [ ERROR ] Failed to execute stage 'Closing up': Couldnt connect to VDSM within 240 seconds [ INFO ] Stage: Clean up I saw within the GUI of the engine that alma04 host was reported as non responsive, second hosted engine host (alma03) was removed as designed, two regular hosts remained as they were before the backup and restore, regular vm that was running on regular host was not shown in VM's tab, but was shown as running from hosts tab.Regular data storage domains existed, but were inactive, hosted-storage domain was wiped out as expected, Data center was non responsive, HE-VM was the only VM under VM's tab and was active. VDSM was dead: [root@alma04 ~]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: inactive (dead) since Mon 2017-03-06 20:05:10 IST; 10min ago Main PID: 10379 (code=exited, status=0/SUCCESS) Mar 06 19:51:14 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Mar 06 19:51:15 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Mar 06 19:51:19 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Mar 06 19:51:24 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Mar 06 19:51:29 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Mar 06 19:51:30 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Mar 06 19:51:34 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo stats = instance.get_all_stats() File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats self._configure_broker_conn(broker) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn dom_type=dom_type) File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain .format(sd_type, options, e)) RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'> Mar 06 20:05:10 alma04.qa.lab.tlv.redhat.com systemd[1]: Stopping Virtual Desktop Server Manager... Mar 06 20:05:10 alma04.qa.lab.tlv.redhat.com vdsmd_init_common.sh[15345]: vdsm: Running run_final_hooks Mar 06 20:05:10 alma04.qa.lab.tlv.redhat.com systemd[1]: Stopped Virtual Desktop Server Manager. I've manually started it: alma04 ~]# systemctl start vdsmd [root@alma04 ~]# systemctl status vdsmd -l ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: active (running) since Mon 2017-03-06 20:15:34 IST; 32s ago Process: 16038 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS) Main PID: 16105 (vdsm) Then host became active within the WEBUI, but without HA. Sosreports from the engine and alma04 being attached.
Created attachment 1260534 [details] sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20170306201110.tar.xz
Created attachment 1260535 [details] sosreport-alma04.qa.lab.tlv.redhat.com-20170306201911.tar.xz
VDSM died at host-deploy time, the point is why. 2017-03-06 20:09:21 INFO otopi.plugins.gr_he_setup.system.vdsmenv util.connect_vdsm_json_rpc:194 Waiting for VDSM to reply 2017-03-06 20:09:21 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod method['method']() File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/system/vdsmenv.py", line 175, in _closeup timeout=ohostedcons.Const.VDSCLI_SSL_TIMEOUT, File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 198, in connect_vdsm_json_rpc timeout=MAX_RETRY * DELAY RuntimeError: Couldnt connect to VDSM within 240 seconds Other errors are subsequent: hosted-engine-setup didn't concluded, ha-agent didn't get configured and so on.
host-deploy failed since cockpit was missing, so all the other errors: 2017-03-06 20:05:20 DEBUG otopi.context context._executeMethod:128 Stage closeup METHOD otopi.plugins.ovirt_host_common.cockpit.packages.Plugin._closeup 2017-03-06 20:05:20 INFO otopi.plugins.ovirt_host_common.cockpit.packages packages._closeup:69 Starting cockpit 2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 starting service cockpit 2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/bin/systemctl', 'start', 'cockpit.service'), executable='None', cwd='None', env=None 2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start', 'cockpit.service'), rc=5 2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'cockpit.service') stdout: 2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'cockpit.service') stderr: Failed to start cockpit.service: Unit not found. 2017-03-06 20:05:20 DEBUG otopi.context context._executeMethod:142 method exception Traceback (most recent call last): File "/tmp/ovirt-fiObiWABAk/pythonlib/otopi/context.py", line 132, in _executeMethod method['method']() File "/tmp/ovirt-fiObiWABAk/otopi-plugins/ovirt-host-common/cockpit/packages.py", line 70, in _closeup self.services.state('cockpit', True) File "/tmp/ovirt-fiObiWABAk/otopi-plugins/otopi/services/systemd.py", line 141, in state service=name, RuntimeError: Failed to start service 'cockpit' 2017-03-06 20:05:20 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'cockpit' 2017-03-06 20:05:20 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN 2017-03-06 20:05:20 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True' 2017-03-06 20:05:20 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/exceptionInfo=list:'[(<type 'exceptions.RuntimeError'>, RuntimeError("Failed to start service 'cockpit'",), <traceback object at 0x1d0a758>)]'
What has been reported on https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c10 are just the symptoms of BZ#1429855, making this dependent on that and moving back to ON_QA
Its interesting, but although I've had cockpit-ovirt-dashboard-0.10.7-0.0.11.el7ev.noarch on host, the backup still failed on engine during HE deployment.
I've tried on the same host again, but this time I've manually removed ovirt-hosted-engine-setup, vdsm, libvirt, sanlock, qemu, cockpit-ovirt-dashboard, then restarted host, then installed ovirt-hosted-engine-setup, saw that cockpit-ovirt-dashboard was not installed on host and then installed it manually too, deployed hosted-engine on clean storage, then updated engine to latest bits during deployment and then copied backup files to engine and then successfully performed restore on engine successfully, then ran engine-setup and then finished hosted engine deployment: On engine: nsednev-he-1 ~]# engine-backup --mode=restore --log=/root/Log_nsednev_from_alma04_rhevm_4_1_1 --file=/root/nsednev_from_alma04_rhevm_4_1_1 --provision-db --provision-dwh-db --provision-reports-db --restore-permissions --he-remove-hosts --he-remove-storage-vm Preparing to restore: - Unpacking file '/root/nsednev_from_alma04_rhevm_4_1_1' Restoring: - Files Provisioning PostgreSQL users/databases: - user 'engine', database 'engine' - user 'ovirt_engine_history', database 'ovirt_engine_history' Restoring: - Engine database 'engine' - Cleaning up temporary tables in engine database 'engine' - Updating DbJustRestored VdcOption in engine database - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database ------------------------------------------------------------------------------ Please note: The engine database was backed up at 2017-03-06 18:30:38.000000000 +0200 . Objects that were added, removed or changed after this date, such as virtual machines, disks, etc., are missing in the engine, and will probably require recovery or recreation. ------------------------------------------------------------------------------ - Removing the hosted-engine storage domain, all its entities and the hosted-engine VM. - Removing all the hosted-engine hosts. - DWH database 'ovirt_engine_history' You should now run engine-setup. Done. [root@nsednev-he-1 ~]# engine-setup [ INFO ] Stage: Initializing [ INFO ] Stage: Environment setup Configuration files: ['/etc/ovirt-engine-setup.conf.d/10-packaging-wsp.conf', '/etc/ovirt-engine-setup.conf.d/10-packaging.conf', '/etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf'] Log file: /var/log/ovirt-engine/setup/ovirt-engine-setup-20170307162459-exl44t.log Version: otopi-1.6.0 (otopi-1.6.0-1.el7ev) [ INFO ] The engine DB has been restored from a backup [ INFO ] Stage: Environment packages setup [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup [ INFO ] Stage: Environment customization Welcome to the RHEV 4.1 setup/upgrade. Please read the RHEV 4.1 install guide https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtualization/4.1/html/Installation_Guide/index.html. Please refer to the RHEV Upgrade Helper application https://access.redhat.com/labs/rhevupgradehelper/ which will guide you in the upgrading process. Would you like to proceed? (Yes, No) [Yes]: --== PRODUCT OPTIONS ==-- --== PACKAGES ==-- [ INFO ] Checking for product updates... [ INFO ] No product updates found --== NETWORK CONFIGURATION ==-- Setup can automatically configure the firewall on this system. Note: automatic configuration of the firewall may overwrite current settings. Do you want Setup to configure the firewall? (Yes, No) [Yes]: [ INFO ] firewalld will be configured as firewall manager. --== DATABASE CONFIGURATION ==-- The detected DWH database size is 22 MB. Setup can backup the existing database. The time and space required for the database backup depend on its size. This process takes time, and in some cases (for instance, when the size is few GBs) may take several hours to complete. If you choose to not back up the database, and Setup later fails for some reason, it will not be able to restore the database and all DWH data will be lost. Would you like to backup the existing database before upgrading it? (Yes, No) [Yes]: Found the following problems in PostgreSQL configuration for the Engine database: autovacuum_vacuum_scale_factor required to be at most 0.01 autovacuum_analyze_scale_factor required to be at most 0.075 autovacuum_max_workers required to be at least 6 maintenance_work_mem required to be at least 65536 Please set: autovacuum_vacuum_scale_factor = 0.01 autovacuum_analyze_scale_factor = 0.075 autovacuum_max_workers = 6 maintenance_work_mem = 65536 in postgresql.conf on 'localhost'. Its location is usually /var/lib/pgsql/data , or somewhere under /etc/postgresql* . The database requires these configurations values to be changed. Setup can fix them for you or abort. Fix automatically? (Yes, No) [Yes]: --== OVIRT ENGINE CONFIGURATION ==-- Perform full vacuum on the engine database engine@localhost? This operation may take a while depending on this setup health and the configuration of the db vacuum process. See https://www.postgresql.org/docs/9.0/static/sql-vacuum.html (Yes, No) [No]: --== STORAGE CONFIGURATION ==-- --== PKI CONFIGURATION ==-- --== APACHE CONFIGURATION ==-- --== SYSTEM CONFIGURATION ==-- --== MISC CONFIGURATION ==-- --== END OF CONFIGURATION ==-- [ INFO ] Stage: Setup validation [ INFO ] Cleaning stale zombie tasks and commands --== CONFIGURATION PREVIEW ==-- Default SAN wipe after delete : False Firewall manager : firewalld Update Firewall : True Host FQDN : nsednev-he-1.qa.lab.tlv.redhat.com Engine database secured connection : False Engine database user name : engine Engine database name : engine Engine database host : localhost Engine database port : 5432 Engine database host name validation : False Engine installation : True PKI organization : qa.lab.tlv.redhat.com DWH installation : True DWH database secured connection : False DWH database host : localhost DWH database user name : ovirt_engine_history DWH database name : ovirt_engine_history Backup DWH database : True DWH database port : 5432 DWH database host name validation : False Configure Image I/O Proxy : True Configure VMConsole Proxy : True Configure WebSocket Proxy : True Please confirm installation settings (OK, Cancel) [OK]: [ INFO ] Cleaning async tasks and compensations [ INFO ] Unlocking existing entities [ INFO ] Checking the Engine database consistency [ INFO ] Stage: Transaction setup [ INFO ] Stopping engine service [ INFO ] Stopping ovirt-fence-kdump-listener service [ INFO ] Stopping dwh service [ INFO ] Stopping Image I/O Proxy service [ INFO ] Stopping vmconsole-proxy service [ INFO ] Stopping websocket-proxy service [ INFO ] Stage: Misc configuration [ INFO ] Updating PostgreSQL configuration [ INFO ] Stage: Package installation [ INFO ] Stage: Misc configuration [ INFO ] Upgrading CA [ INFO ] Backing up database localhost:engine to '/var/lib/ovirt-engine/backups/engine-20170307162553.XmWk5v.dump'. [ INFO ] Creating/refreshing Engine database schema [ INFO ] Backing up database localhost:ovirt_engine_history to '/var/lib/ovirt-engine-dwh/backups/dwh-20170307162610.LW_hQY.dump'. [ INFO ] Creating/refreshing DWH database schema [ INFO ] Configuring Image I/O Proxy [ INFO ] Configuring WebSocket Proxy [ INFO ] Creating/refreshing Engine 'internal' domain database schema [ INFO ] Generating post install configuration file '/etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf' [ INFO ] Stage: Transaction commit [ INFO ] Stage: Closing up [ INFO ] Starting engine service [ INFO ] Starting dwh service [ INFO ] Restarting ovirt-vmconsole proxy service --== SUMMARY ==-- [ INFO ] Restarting httpd Web access is enabled at: http://nsednev-he-1.qa.lab.tlv.redhat.com:80/ovirt-engine https://nsednev-he-1.qa.lab.tlv.redhat.com:443/ovirt-engine Internal CA DD:B5:1A:3D:D4:D8:60:79:4F:70:D4:4E:47:65:A0:D1:AA:85:00:08 SSH fingerprint: 3b:66:3c:e1:54:c1:6e:af:f0:8d:c4:f3:29:44:66:a9 --== END OF SUMMARY ==-- [ INFO ] Stage: Clean up Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20170307162459-exl44t.log [ INFO ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20170307162637-setup.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ INFO ] Execution of setup completed successfully On host: [root@alma04 ~]# hosted-engine --deploy [ INFO ] Stage: Initializing [ INFO ] Generating a temporary VNC password. [ INFO ] Stage: Environment setup During customization use CTRL-D to abort. Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards. Are you sure you want to continue? (Yes, No)[Yes]: It has been detected that this program is executed through an SSH connection without using screen. Continuing with the installation may lead to broken installation if the network connection fails. It is highly recommended to abort the installation and run it inside a screen session using command "screen". Do you want to continue anyway? (Yes, No)[No]: yes [ INFO ] Hardware supports virtualization Configuration files: [] Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20170307154034-82ops6.log Version: otopi-1.6.0 (otopi-1.6.0-1.el7ev) [ INFO ] Detecting available oVirt engine appliances [ ERROR ] No engine appliance image is available on your system. The oVirt engine appliance is now required to deploy hosted-engine. You could get oVirt engine appliance installing ovirt-engine-appliance rpm. Do you want to install ovirt-engine-appliance rpm? (Yes, No) [Yes]: [ INFO ] Stage: Environment packages setup [ INFO ] Installing the oVirt engine appliance [ INFO ] Yum Status: Downloading Packages [ INFO ] Yum Downloading: rhvm-appliance-4.1.20170221.0-1.el7ev.noarch.rpm 971 M(61%) [ INFO ] Yum Download/Verify: 1:rhvm-appliance-4.1.20170221.0-1.el7ev.noarch [ INFO ] Yum Status: Check Package Signatures [ INFO ] Yum Status: Running Test Transaction [ INFO ] Yum Status: Running Transaction [ INFO ] Yum install: 1/1: 1:rhvm-appliance-4.1.20170221.0-1.el7ev.noarch [ INFO ] Yum Verify: 1/1: rhvm-appliance.noarch 1:4.1.20170221.0-1.el7ev - u [ INFO ] Stage: Programs detection [ INFO ] Stage: Environment setup [ INFO ] Stage: Environment customization --== STORAGE CONFIGURATION ==-- Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: Please specify the full shared storage connection path to use (example: host:/path): yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1 --== HOST NETWORK CONFIGURATION ==-- [ INFO ] Bridge ovirtmgmt already created iptables was detected on your computer, do you wish setup to configure it? (Yes, No)[Yes]: Please indicate a pingable gateway IP address [10.35.72.254]: --== VM CONFIGURATION ==-- The following appliance have been found on your system: [1] - The RHEV-M Appliance image (OVA) - 4.1.20170221.0-1.el7ev [2] - Directly select an OVA file Please select an appliance (1, 2) [1]: [ INFO ] Verifying its sha1sum [ INFO ] Checking OVF archive content (could take a few minutes depending on archive size) [ INFO ] Checking OVF XML content (could take a few minutes depending on archive size) Please specify the console type you would like to use to connect to the VM (vnc, spice) [vnc]: [ INFO ] Detecting host timezone. Would you like to use cloud-init to customize the appliance on the first boot (Yes, No)[Yes]? Would you like to generate on-fly a cloud-init ISO image (of no-cloud type) or do you have an existing one (Generate, Existing)[Generate]? Please provide the FQDN you would like to use for the engine appliance. Note: This will be the FQDN of the engine VM you are now going to launch, it should not point to the base host or to any other existing machine. Engine VM FQDN: (leave it empty to skip): []: nsednev-he-1.qa.lab.tlv.redhat.com Please provide the domain name you would like to use for the engine appliance. Engine VM domain: [qa.lab.tlv.redhat.com] Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? no Enter root password that will be used for the engine appliance (leave it empty to skip): Confirm appliance root password: Enter ssh public key for the root user that will be used for the engine appliance (leave it empty to skip): [WARNING] Skipping appliance root ssh public key Do you want to enable ssh access for the root user (yes, no, without-password) [yes]: Please specify the size of the VM disk in GB: [50]: Please specify the memory size of the VM in MB (Defaults to appliance OVF value): [4096]: 16384 The following CPU types are supported by this host: - model_SandyBridge: Intel SandyBridge Family - model_Westmere: Intel Westmere Family - model_Nehalem: Intel Nehalem Family - model_Penryn: Intel Penryn Family - model_Conroe: Intel Conroe Family Please specify the CPU type to be used by the VM [model_SandyBridge]: Please specify the number of virtual CPUs for the VM (Defaults to appliance OVF value): [2]: 4 You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:2b:b4:d4]: 00:16:3e:7b:b8:53 How should the engine VM network be configured (DHCP, Static)[DHCP]? Add lines for the appliance itself and for this host to /etc/hosts on the engine VM? Note: ensuring that this host could resolve the engine VM hostname is still up to you (Yes, No)[No] yes --== HOSTED ENGINE CONFIGURATION ==-- Enter engine admin password: Confirm engine admin password: Please provide the name of the SMTP server through which we will send notifications [localhost]: Please provide the TCP port number of the SMTP server [25]: Please provide the email address from which notifications will be sent [root@localhost]: Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: [ INFO ] Stage: Setup validation --== CONFIGURATION PREVIEW ==-- Bridge interface : enp3s0f0 Engine FQDN : nsednev-he-1.qa.lab.tlv.redhat.com Bridge name : ovirtmgmt Host address : alma04 SSH daemon port : 22 Firewall manager : iptables Gateway address : 10.35.72.254 Storage Domain type : nfs3 Image size GB : 50 Host ID : 1 Storage connection : yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1 Console type : vnc Memory size MB : 16384 MAC address : 00:16:3e:7b:b8:53 Number of CPUs : 4 OVF archive (for disk boot) : /usr/share/ovirt-engine-appliance/rhvm-appliance-4.1.20170221.0-1.el7ev.ova Appliance version : 4.1.20170221.0-1.el7ev Engine VM timezone : Asia/Jerusalem CPU Type : model_SandyBridge Please confirm installation settings (Yes, No)[Yes]: [ INFO ] Stage: Transaction setup [ INFO ] Stage: Misc configuration [ INFO ] Stage: Package installation [ INFO ] Stage: Misc configuration [ INFO ] Configuring libvirt [ INFO ] Configuring VDSM [ INFO ] Starting vdsmd [ INFO ] Creating Storage Domain [ INFO ] Creating Storage Pool [ INFO ] Connecting Storage Pool [ INFO ] Verifying sanlock lockspace initialization [ INFO ] Creating Image for 'hosted-engine.lockspace' ... [ INFO ] Image for 'hosted-engine.lockspace' created successfully [ INFO ] Creating Image for 'hosted-engine.metadata' ... [ INFO ] Image for 'hosted-engine.metadata' created successfully [ INFO ] Creating VM Image [ INFO ] Extracting disk image from OVF archive (could take a few minutes depending on archive size) [ INFO ] Validating pre-allocated volume size [ INFO ] Uploading volume to data domain (could take a few minutes depending on archive size) [ INFO ] Image successfully imported from OVF [ INFO ] Destroying Storage Pool [ INFO ] Start monitoring domain [ INFO ] Configuring VM [ INFO ] Updating hosted-engine configuration [ INFO ] Stage: Transaction commit [ INFO ] Stage: Closing up [ INFO ] Creating VM You can now connect to the VM with the following command: hosted-engine --console You can also graphically connect to the VM from your system with the following command: remote-viewer vnc://alma04.qa.lab.tlv.redhat.com:5900 Use temporary password "8716nUeL" to connect to vnc console. Please ensure that your Guest OS is properly configured to support serial console according to your distro documentation. Follow http://www.ovirt.org/Serial_Console_Setup#I_need_to_access_the_console_the_old_way for more info. If you need to reboot the VM you will need to start it manually using the command: hosted-engine --vm-start You can then set a temporary password using the command: hosted-engine --add-console-password Please install and setup the engine in the VM. You may also be interested in installing ovirt-guest-agent-common package in the VM. The VM has been rebooted. To continue please install oVirt-Engine in the VM (Follow http://www.ovirt.org/Quick_Start_Guide for more info). Make a selection from the options below: (1) Continue setup - oVirt-Engine installation is ready and ovirt-engine service is up (2) Abort setup (3) Power off and restart the VM (4) Destroy VM and abort setup (1, 2, 3, 4)[1]: Checking for oVirt-Engine status at nsednev-he-1.qa.lab.tlv.redhat.com... [ INFO ] Engine replied: DB Up!Welcome to Health Status! [ INFO ] Acquiring internal CA cert from the engine [ INFO ] The following CA certificate is going to be used, please immediately interrupt if not correct: [ INFO ] Issuer: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-1.qa.lab.tlv.redhat.com.20881, Subject: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-1.qa.lab.tlv.redhat.com.20881, Fingerprint (SHA-1): DDB51A3DD4D860794F70D44E4765A0D1AA850008 [ INFO ] Connecting to the Engine Enter the name of the cluster to which you want to add the host (Default, regular_hosts_cluster) [Default]: [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] The VDSM Host is now operational [ INFO ] Saving hosted-engine configuration on the shared storage domain Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ INFO ] Enabling and starting HA services [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20170307162906.conf' [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination [ INFO ] Hosted Engine successfully deployed [root@alma04 ~]# After engine db was restored, I've logged in to WEBUI of the engine and saw that data center was changing its status from contending to non responsive in rounds: VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Mar 7, 2017 4:44:47 PM Invalid status on Data Center Default. Setting status to Non Responsive. Mar 7, 2017 4:44:44 PM VDSM puma18.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Mar 7, 2017 4:44:39 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Mar 7, 2017 4:44:29 PM Invalid status on Data Center Default. Setting status to Non Responsive. Mar 7, 2017 4:44:27 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id Mar 7, 2017 4:42:37 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Mar 7, 2017 4:42:31 PM Invalid status on Data Center Default. Setting status to Non Responsive. Mar 7, 2017 4:42:29 PM VDSM puma18.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Mar 7, 2017 4:42:26 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Mar 7, 2017 4:42:15 PM Invalid status on Data Center Default. Setting status to Non Responsive. Mar 7, 2017 4:42:13 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Mar 7, 2017 4:42:12 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Mar 7, 2017 4:42:09 PM Invalid status on Data Center Default. Setting status to Non Responsive. Mar 7, 2017 4:42:07 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id Mar 7, 2017 4:40:05 PM Invalid status on Data Center Default. Setting status to Non Responsive. Mar 7, 2017 4:40:03 PM VDSM puma18.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Mar 7, 2017 4:39:58 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM
Forth to https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c19, I've powered-off alma03, which probably stayed as SPM and did not released the sanlock lock, then after few minutes alma04 got SPM and auto-import of hosted-storage finished successfully. Then I've added alma03 back to engine as hosted-engine-host. I had to sacrifice the VMs on alma03 when I've powered-off alma03 to finish this restore flow and once added alma03, those vms were not restored, comparing to vms on regular hosts which remained as before the restore.
Blocked by 1417518.
Created attachment 1268952 [details] hosts screenshot from UI
Created attachment 1268954 [details] VMs screenshot from UI
Created attachment 1268956 [details] Storage screenshot from UI
1)Performed these steps first: Scenario setup: - Deploy hosted-engine on a couple of hosts; - also add an host not involved in hosted-engine - add a regular storage domain, - add a couple of VMs - take a backup of the engine with engine-backup 2)Copied backup from engine to puma18, which was running 14 guest VMs and HE-VM and it also was an SPM. 3)Cleaned the HE-VM's storage by erasing everything from it and then reprovisioned puma19 to fresh RHEL7.3 and installed ovirt-hosted-engine package on it and also latest appliance, you may see it's version as appears bellow. 4)Copied backup files from the puma18 (its still an SPM host) to puma19. 5)Stated deployment of hosted-engine on puma19 and followed these steps: - start hosted-engine-setup on puma19 - point to the same storage (which is clean now) - respond no to: 'Automatically execute engine-setup on the engine appliance on first boot' - copy the backup of the engine DB to the engine VM - connect to the engine VM and execute engine-backup to restore the backup appending --he-remove-hosts option - execute engine-setup - come back to hosted-engine-setup and terminate the deployment At the end only the host where you run hosted-engine --deploy should be there as an hosted-engine host; other hosts (non involved in HE) should be there as well 6)Deployment was successfully accomplished and I saw in engine that only puma19 was added to it and there also was alma03 none hosted-engine host with 6 regular VMs, then I've powered-off the HE-VM in order to finish the hosted-engine deployment. 7)HA agent started HE-VM automatically on puma19 and I've logged in to WEBUI. 8)I've seen both data storage domains in inactive state, so tried to activate at least one of them, so I could get hosted-engine's storage domain autoimported (it was cleared during restore). 7)puma19 host was shown in contending status as puma18 was still running as SPM and prevented puma19 from becoming an SPM, hence it could not get SPM and activate at least one data storage domain, this prevented it also from getting hosted-storage-domain from being auto-imported, see the 1417518. Logs from UI: Apr 5, 2017 6:49:35 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Apr 5, 2017 6:49:23 PM Invalid status on Data Center Default. Setting status to Non Responsive. Apr 5, 2017 6:49:23 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id Apr 5, 2017 6:47:18 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Apr 5, 2017 6:47:14 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Apr 5, 2017 6:47:14 PM Failed to activate Storage Domain nsednev_he_4_data_sd_1 (Data Center Default) by admin@internal-authz Apr 5, 2017 6:47:14 PM Invalid status on Data Center Default. Setting status to Non Responsive. Apr 5, 2017 6:47:12 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Apr 5, 2017 6:47:09 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Apr 5, 2017 6:47:04 PM Invalid status on Data Center Default. Setting status to Non Responsive. Apr 5, 2017 6:47:01 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id Apr 5, 2017 6:44:38 PM Invalid status on Data Center Default. Setting status to Non Responsive. Apr 5, 2017 6:44:36 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Apr 5, 2017 6:44:33 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Apr 5, 2017 6:44:31 PM Invalid status on Data Center Default. Setting status to Non Responsive. Apr 5, 2017 6:44:31 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id Apr 5, 2017 6:42:25 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM Apr 5, 2017 6:42:11 PM VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id Apr 5, 2017 6:41:33 PM The Hosted Engine Storage Domain doesn't exist. It will be imported automatically upon data center activation, which requires adding an initial storage domain to the data center. Apr 5, 2017 6:41:33 PM Invalid status on Data Center Default. Setting status to Non Responsive. Apr 5, 2017 6:41:32 PM VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock Apr 5, 2017 6:41:27 PM Invalid status on Data Center Default. Setting status to Non Responsive. Apr 5, 2017 6:41:18 PM Affinity Rules Enforcement Manager started. Apr 5, 2017 6:40:55 PM ETL Service Started Apr 5, 2017 6:38:07 PM ETL Service Stopped Apr 5, 2017 6:37:30 PM Invalid status on Data Center Default. Setting status to Non Responsive.
Created attachment 1269019 [details] Screenshot from 2017-04-05 18-38-30.png
Created attachment 1269020 [details] can't obtain lock and become an SPM because of puma18 is holding it
I can't verify this RFE until https://bugzilla.redhat.com/show_bug.cgi?id=1417518#c9 issue is fixed, which is being reproduced exactly here in https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28.
Created attachment 1269023 [details] sosreport-nsednev-he-4.scl.lab.tlv.redhat.com-20170405185534.tar.xz
Created attachment 1269024 [details] sosreport-puma19.scl.lab.tlv.redhat.com-20170405185518.tar.xz
Components on engine: rhevm-branding-rhev-4.1.0-1.el7ev.noarch rhevm-doc-4.1.0-3.el7ev.noarch rhev-guest-tools-iso-4.1-5.el7ev.noarch rhevm-setup-plugins-4.1.1-1.el7ev.noarch rhevm-4.1.1.7-0.1.el7.noarch rhevm-dependencies-4.1.1-1.el7ev.noarch Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017 Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Components on host (puma19): rhvm-appliance-4.1.20170221.0-1.el7ev.noarch ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch ovirt-host-deploy-1.6.3-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 ovirt-vmconsole-host-1.0.4-1.el7ev.noarch mom-0.5.9-1.el7ev.noarch vdsm-4.19.10.1-1.el7ev.x86_64 ovirt-hosted-engine-ha-2.1.0.5-1.el7ev.noarch ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 ovirt-imageio-daemon-1.0.0-0.el7ev.noarch Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017 Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo)
Have you seen message ' - Please redeploy already existing HE hosts IMMEDIATELY after restore, to avoid possible SPM deadlocks.'? In case of positive answer, you did your next steps wrong. In https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28 you wrote: 6)Deployment was successfully accomplished and I saw in engine that only puma19 was added to it and there also was alma03 none hosted-engine host with 6 regular VMs, then I've powered-off the HE-VM in order to finish the hosted-engine deployment. But, as warning message told you, you are supposed to redeploy alma03 right after restore procedure.
(In reply to Denis Chaplygin from comment #35) > Have you seen message ' - Please redeploy already existing HE hosts > IMMEDIATELY after restore, to avoid possible SPM deadlocks.'? > > In case of positive answer, you did your next steps wrong. In > https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28 you wrote: > > 6)Deployment was successfully accomplished and I saw in engine that only > puma19 was added to it and there also was alma03 none hosted-engine host > with 6 regular VMs, then I've powered-off the HE-VM in order to finish the > hosted-engine deployment. > > > But, as warning message told you, you are supposed to redeploy alma03 right > after restore procedure. No, I did not seen that message, where is it expected to appear? If customer like in my scenario have 3 hosts, one regular host alma03, then two hosted-engine-hosts puma18&19 and they're all running number of VMs that only one host might be placed to redeployment at a time, to avoid VM's shutdown, then you simply can't redeploy all hosted-engine-hosts without loosing VMs.
Answered in BZ1417518
Moving to verified forth to latest reproduction, while following these steps: 1)Deployed clean HE environment over 2 hosted engine hosts (puma18 and puma19) using NFS storage domain for HE. 2)Added 2 data NFS storage domains. 3)Got HE's storage domain auto-imported. 4)Added one regular host alma04. 5)Created 20 guest-VMs. 6)Migrated 19 guest VMs to alma04 and left one guest VM on puma19. 7)Made alma04 (regular host) as SPM. 8)HE-VM running on puma18. 9)Backed up HE-VM's db and copied it to puma19. 10)Wiped out HE's storage domain e.g. "rm -rf /mnt/nsednev_he_4/*". 11)Reprovisioned puma18 to clean and fresh RHEL7.3. 12)Added 4.1 latest repos to puma18. 13)Installed rhvm-appliance-4.1.20170403.0-1.el7.noarch on puma18. 14)Started deployment of hosted-engine on puma18. 15)During deployment made a restore of HE-db copied from puma19 to the engine and updated engine to latest bits, then ran engine-setup. nsednev-he-4 ~]# engine-backup --mode=restore --log=/root/Log_nsednev --file=/root/nsednev --provision-db --provision-dwh-db --provision-reports-db --restore-permissions --he-remove-hosts --he-remove-storage-vm Preparing to restore: - Unpacking file '/root/nsednev' Restoring: - Files Provisioning PostgreSQL users/databases: - user 'engine', database 'engine' - user 'ovirt_engine_history', database 'ovirt_engine_history' Restoring: - Engine database 'engine' - Cleaning up temporary tables in engine database 'engine' - Updating DbJustRestored VdcOption in engine database - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database ------------------------------------------------------------------------------ Please note: The engine database was backed up at 2017-04-09 11:57:55.000000000 -0400 . Objects that were added, removed or changed after this date, such as virtual machines, disks, etc., are missing in the engine, and will probably require recovery or recreation. ------------------------------------------------------------------------------ - Removing the hosted-engine storage domain, all its entities and the hosted-engine VM. - Removing all the hosted-engine hosts. - Please redeploy already existing HE hosts IMMEDIATELY after restore, to avoid possible SPM deadlocks. - DWH database 'ovirt_engine_history' You should now run engine-setup. Done. 16)Finished the deployment on puma18 and got HE environment without puma19 as expected. 17)Added puma19 to restored environment and single guest VM was running on puma19. 18)Regular host alma04 was already in UI as was restored as expected and 19 guest VMs were running on it. Works for me on these components on hosts: rhvm-appliance-4.1.20170403.0-1.el7.noarch libvirt-client-2.0.0-10.el7_3.5.x86_64 ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch ovirt-host-deploy-1.6.3-1.el7ev.noarch ovirt-imageio-common-1.0.0-0.el7ev.noarch ovirt-vmconsole-host-1.0.4-1.el7ev.noarch mom-0.5.9-1.el7ev.noarch vdsm-4.19.10.1-1.el7ev.x86_64 ovirt-hosted-engine-ha-2.1.0.5-1.el7ev.noarch ovirt-setup-lib-1.1.0-1.el7ev.noarch ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch ovirt-vmconsole-1.0.4-1.el7ev.noarch qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64 ovirt-imageio-daemon-1.0.0-0.el7ev.noarch sanlock-3.4.0-1.el7.x86_64 Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017 Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) Engine: rhevm-doc-4.1.0-3.el7ev.noarch rhev-guest-tools-iso-4.1-5.el7ev.noarch rhevm-4.1.1.8-0.1.el7.noarch rhevm-dependencies-4.1.1-1.el7ev.noarch rhevm-branding-rhev-4.1.0-1.el7ev.noarch rhevm-setup-plugins-4.1.1-1.el7ev.noarch Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017 Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux Red Hat Enterprise Linux Server release 7.3 (Maipo) In case of hitting https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28, it is totally possible to add removed hosted-engine-host that is still acting as SPM, as regular host, then to set it to maintenance in UI, then to reinstall as hosted-engine-host, all this without loosing any of running guest-VMs on that host. I've just verified that on my clean environment now and may confirm that it did worked just fine, puma19 had been added as regular host and then reinstalled as hosted-engine-host, alma04 taken SPM as soon as puma19 was added, puma19 received Host ID=3 instead of previously assigned Host ID=2.