1235200 – [RFE] Make it easier to remove hosts when restoring hosted-engine from backup

Bug 1235200 - [RFE] Make it easier to remove hosts when restoring hosted-engine from backup

Summary: [RFE] Make it easier to remove hosts when restoring hosted-engine from backup

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	ovirt-engine
Classification:	oVirt
Component:	RFEs
Sub Component:
Version:	---
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	medium
Target Milestone:	ovirt-4.1.0-alpha
Target Release:	4.1.0.2
Assignee:	Simone Tiraboschi
QA Contact:	Nikolai Sednev
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1241811 (view as bug list)
Depends On:	1403846 1416459 1416466 1417518 1420283 1429855
Blocks:	1420604 1422144
TreeView+	depends on / blocked

Reported:	2015-06-24 09:52 UTC by Yedidyah Bar David
Modified:	2017-04-27 09:37 UTC (History)
CC List:	15 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2017-04-27 09:37:00 UTC
oVirt Team:	Integration
Embargoed:
Dependent Products:
Flags:	rule-engine: ovirt-4.1+ nsednev: testing_plan_complete+ ylavi: planning_ack+ sbonazzo: devel_ack+ mavital: testing_ack+

Attachments	(Terms of Use)
sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20170306201110.tar.xz (8.29 MB, application/x-xz) 2017-03-06 18:21 UTC, Nikolai Sednev	no flags	Details
sosreport-alma04.qa.lab.tlv.redhat.com-20170306201911.tar.xz (9.89 MB, application/x-xz) 2017-03-06 18:22 UTC, Nikolai Sednev	no flags	Details
hosts screenshot from UI (110.05 KB, image/png) 2017-04-05 13:08 UTC, Nikolai Sednev	no flags	Details
VMs screenshot from UI (232.26 KB, image/png) 2017-04-05 13:09 UTC, Nikolai Sednev	no flags	Details
Storage screenshot from UI (116.55 KB, image/png) 2017-04-05 13:10 UTC, Nikolai Sednev	no flags	Details
Screenshot from 2017-04-05 18-38-30.png (148.64 KB, image/png) 2017-04-05 15:53 UTC, Nikolai Sednev	no flags	Details
can't obtain lock and become an SPM because of puma18 is holding it (3.11 MB, application/octet-stream) 2017-04-05 15:54 UTC, Nikolai Sednev	no flags	Details
sosreport-nsednev-he-4.scl.lab.tlv.redhat.com-20170405185534.tar.xz (8.18 MB, application/x-xz) 2017-04-05 16:03 UTC, Nikolai Sednev	no flags	Details
sosreport-puma19.scl.lab.tlv.redhat.com-20170405185518.tar.xz (8.70 MB, application/x-xz) 2017-04-05 16:04 UTC, Nikolai Sednev	no flags	Details
View All

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1065350	medium	CLOSED	hosted-engine should prompt a question at the user when the host was already a host in the engine	2022-02-25 08:23:11 UTC
Red Hat Bugzilla	1232136	urgent	CLOSED	How to backup/restore hosted-engine in rhev env ?	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1408942	medium	CLOSED	Host deploy should install cockpit-ovirt.	2021-02-22 00:41:40 UTC
Red Hat Knowledge Base (Solution)	2897821	None	None	None	2017-03-31 18:52:26 UTC
Red Hat Knowledge Base (Solution)	2981731	None	None	None	2017-03-31 18:51:56 UTC
oVirt gerrit	64966	master	MERGED	hosted-engine: add a DB cleaner utility	2021-01-09 04:03:24 UTC

Internal Links: 1065350 1232136 1408942

Description Yedidyah Bar David 2015-06-24 09:52:16 UTC

The procedure [1] for backup and restore of hosted-engine says also:

remove the hosts used for Hosted Engine from the engine

This might not always be easy, depending on the state of said hosts during the time of backup.

In severe cases this requires direct manipulation of the database, as there is currently no option to force a removal from the web interface.

One way to make it simpler is to add an option to engine-backup --mode=restore to do that inside the database.

[1] http://www.ovirt.org/OVirt_Hosted_Engine_Backup_and_Restore

Comment 1 Yedidyah Bar David 2015-06-24 09:57:34 UTC

Another option is to change the definition of bug 1065350 - add there another option "Remove this host from the engine".

Comment 2 Red Hat Bugzilla Rules Engine 2015-10-19 11:02:18 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 3 Simone Tiraboschi 2016-09-29 12:45:09 UTC

*** Bug 1241811 has been marked as a duplicate of this bug. ***

Comment 4 Sandro Bonazzola 2016-12-12 14:02:20 UTC

The fix for this issue should be included in oVirt 4.1.0 beta 1 released on December 1st. If not included please move back to modified.

Comment 5 Nikolai Sednev 2016-12-13 11:58:30 UTC

Please provide reproduction steps for the bug.

Comment 6 Simone Tiraboschi 2017-01-31 16:06:01 UTC

Scenario setup: 
- Deploy hosted-engine on a couple of hosts;
- also add an host not involved in hosted-engine
- add a regular storage domain,
- add a couple of VMs
- take a backup of the engine with engine-backup

Try the recovery over two different hosts and also over the same hosts:
- start hosted-engine-setup on the first host
- point to the same storage
- respond no to: 'Automatically execute engine-setup on the engine appliance on first boot'
- copy the backup of the engine DB to the engine VM
- connect to the engine VM and execute engine-backup to restore the backup appending --he-remove-hosts option
- execute engine-setup
- come back to hosted-engine-setup and terminate the deployment

At the end only the host where you run hosted-engine --deploy should be there as an hosted-engine host; other hosts (non involved in HE) should be there as well

Comment 8 Nikolai Sednev 2017-03-06 17:01:32 UTC

[root@nsednev-he-1 ~]# engine-backup --mode=restore --log=/root/Log_nsednev_from_alma04_rhevm_4_1_1 --file=/root/nsednev_from_alma04_rhevm_4_1_1 --provision-db --provision-dwh-db --provision-reports-db --restore-permissions  --he-remove-hosts --he-remove-storage-vm
Preparing to restore:
- Unpacking file '/root/nsednev_from_alma04_rhevm_4_1_1'
Restoring:
- Files
Provisioning PostgreSQL users/databases:
- user 'engine', database 'engine'
- user 'ovirt_engine_history', database 'ovirt_engine_history'
Restoring:
- Engine database 'engine'
  - Cleaning up temporary tables in engine database 'engine'
  - Updating DbJustRestored VdcOption in engine database
  - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database
------------------------------------------------------------------------------
Please note:

The engine database was backed up at 2017-03-06 18:30:38.000000000 +0200 .

Objects that were added, removed or changed after this date, such as virtual
machines, disks, etc., are missing in the engine, and will probably require
recovery or recreation.
------------------------------------------------------------------------------
  - Removing the hosted-engine storage domain, all its entities and the hosted-engine VM.
FATAL: Failed cleaning hosted-engine

Comment 9 Nikolai Sednev 2017-03-06 17:30:15 UTC

Probably an old appliance, although it is latest from our repos.
Components on host:
rhvm-appliance-4.1.20170221.0-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.6.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.9-1.el7ev.noarch
vdsm-4.19.7-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.0.4-1.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
libvirt-client-2.0.0-10.el7_3.5.x86_64
ovirt-vmconsole-1.0.4-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.4-1.el7ev.noarch
ovirt-host-deploy-1.6.2-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Engine:
rhevm-doc-4.1.0-2.el7ev.noarch
rhev-guest-tools-iso-4.1-4.el7ev.noarch
rhevm-dependencies-4.1.0-1.el7ev.noarch
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-setup-plugins-4.1.0-1.el7ev.noarch
rhevm-4.1.1.2-0.1.el7.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

CREATE FUNCTION
********* QUERY **********
SELECT DeleteHostedEngineStorageVM();
**************************

SELECT DeleteHostedEngineStorageVM();
ERROR:  The hosted-engine storage domain contains more than one vm.
FATAL: Cannot execute sql command: --command=SELECT DeleteHostedEngineStorageVM();
2017-03-06 18:59:34 9581: FATAL: Failed cleaning hosted-engine

Comment 10 Nikolai Sednev 2017-03-06 18:19:44 UTC

Backup succeeded with rhevm-setup-plugins.noarch 4.1.1-1.el7ev on engine:
rhevm-doc-4.1.0-2.el7ev.noarch
rhev-guest-tools-iso-4.1-4.el7ev.noarch
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-4.1.1.3-0.1.el7.noarch
rhevm-setup-plugins-4.1.1-1.el7ev.noarch
rhevm-dependencies-4.1.1-1.el7ev.noarch
Linux version 3.10.0-514.6.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Sat Dec 10 11:15:38 EST 2016
Linux 3.10.0-514.6.1.el7.x86_64 #1 SMP Sat Dec 10 11:15:38 EST 2016 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)


I had to update the appliance from within to latest components, so backup could get finished.


After running engine-setup, I've continued with hosted-engine deployment on host and got:
[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ ERROR ] The VDSM host was found in a failed state. Please check engine and bootstrap installation logs.
[ ERROR ] Unable to add alma04.qa.lab.tlv.redhat.com to the manager
[ INFO  ] Waiting for VDSM to reply
[ INFO  ] Waiting for VDSM to reply
[ ERROR ] Failed to execute stage 'Closing up': Couldnt  connect to VDSM within 240 seconds
[ INFO  ] Stage: Clean up

I saw within the GUI of the engine that alma04 host was reported as non responsive, second hosted engine host (alma03) was removed as designed, two regular hosts remained as they were before the backup and restore, regular vm that was running on regular host was not shown in VM's tab, but was shown as running from hosts tab.Regular data storage domains existed, but were inactive, hosted-storage domain was wiped out as expected, Data center was non responsive, HE-VM was the only VM under VM's tab and was active.


VDSM was dead:
[root@alma04 ~]# systemctl status vdsmd -l
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: inactive (dead) since Mon 2017-03-06 20:05:10 IST; 10min ago
 Main PID: 10379 (code=exited, status=0/SUCCESS)

Mar 06 19:51:14 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
                                                              self._configure_broker_conn(broker)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
                                                              dom_type=dom_type)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
                                                              .format(sd_type, options, e))
                                                          RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Mar 06 19:51:15 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
                                                              self._configure_broker_conn(broker)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
                                                              dom_type=dom_type)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
                                                              .format(sd_type, options, e))
                                                          RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Mar 06 19:51:19 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
                                                              self._configure_broker_conn(broker)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
                                                              dom_type=dom_type)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
                                                              .format(sd_type, options, e))
                                                          RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Mar 06 19:51:24 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
                                                              self._configure_broker_conn(broker)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
                                                              dom_type=dom_type)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
                                                              .format(sd_type, options, e))
                                                          RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Mar 06 19:51:29 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
                                                              self._configure_broker_conn(broker)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
                                                              dom_type=dom_type)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
                                                              .format(sd_type, options, e))
                                                          RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Mar 06 19:51:30 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
                                                              self._configure_broker_conn(broker)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
                                                              dom_type=dom_type)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
                                                              .format(sd_type, options, e))
                                                          RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Mar 06 19:51:34 alma04.qa.lab.tlv.redhat.com vdsm[10379]: vdsm root ERROR failed to retrieve Hosted Engine HA info
                                                          Traceback (most recent call last):
                                                            File "/usr/lib/python2.7/site-packages/vdsm/host/api.py", line 231, in _getHaInfo
                                                              stats = instance.get_all_stats()
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 103, in get_all_stats
                                                              self._configure_broker_conn(broker)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/client/client.py", line 180, in _configure_broker_conn
                                                              dom_type=dom_type)
                                                            File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/brokerlink.py", line 177, in set_storage_domain
                                                              .format(sd_type, options, e))
                                                          RequestError: Failed to set storage domain FilesystemBackend, options {'dom_type': 'nfs3', 'sd_uuid': '28d365ee-2af9-4d9b-9a7e-4c28e5cf998d'}: Request failed: <class 'ovirt_hosted_engine_ha.lib.storage_backends.BackendFailureException'>
Mar 06 20:05:10 alma04.qa.lab.tlv.redhat.com systemd[1]: Stopping Virtual Desktop Server Manager...
Mar 06 20:05:10 alma04.qa.lab.tlv.redhat.com vdsmd_init_common.sh[15345]: vdsm: Running run_final_hooks
Mar 06 20:05:10 alma04.qa.lab.tlv.redhat.com systemd[1]: Stopped Virtual Desktop Server Manager.

I've manually started it:
alma04 ~]# systemctl start vdsmd
[root@alma04 ~]# systemctl status vdsmd -l
● vdsmd.service - Virtual Desktop Server Manager
   Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled)
   Active: active (running) since Mon 2017-03-06 20:15:34 IST; 32s ago
  Process: 16038 ExecStartPre=/usr/libexec/vdsm/vdsmd_init_common.sh --pre-start (code=exited, status=0/SUCCESS)
 Main PID: 16105 (vdsm)

Then host became active within the WEBUI, but without HA.

Sosreports from the engine and alma04 being attached.

Comment 11 Red Hat Bugzilla Rules Engine 2017-03-06 18:19:51 UTC

Target release should be placed once a package build is known to fix a issue. Since this bug is not modified, the target version has been reset. Please use target milestone to plan a fix for a oVirt release.

Comment 12 Nikolai Sednev 2017-03-06 18:21:01 UTC

Created attachment 1260534 [details]
sosreport-nsednev-he-1.qa.lab.tlv.redhat.com-20170306201110.tar.xz

Comment 13 Nikolai Sednev 2017-03-06 18:22:13 UTC

Created attachment 1260535 [details]
sosreport-alma04.qa.lab.tlv.redhat.com-20170306201911.tar.xz

Comment 15 Simone Tiraboschi 2017-03-07 09:30:49 UTC

VDSM died at host-deploy time, the point is why.

2017-03-06 20:09:21 INFO otopi.plugins.gr_he_setup.system.vdsmenv util.connect_vdsm_json_rpc:194 Waiting for VDSM to reply
2017-03-06 20:09:21 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/usr/share/ovirt-hosted-engine-setup/scripts/../plugins/gr-he-setup/system/vdsmenv.py", line 175, in _closeup
    timeout=ohostedcons.Const.VDSCLI_SSL_TIMEOUT,
  File "/usr/lib/python2.7/site-packages/ovirt_hosted_engine_ha/lib/util.py", line 198, in connect_vdsm_json_rpc
    timeout=MAX_RETRY * DELAY
RuntimeError: Couldnt  connect to VDSM within 240 seconds

Other errors are subsequent: hosted-engine-setup didn't concluded, ha-agent didn't get configured and so on.

Comment 16 Simone Tiraboschi 2017-03-07 09:33:36 UTC

host-deploy failed since cockpit was missing, so all the other errors:

2017-03-06 20:05:20 DEBUG otopi.context context._executeMethod:128 Stage closeup METHOD otopi.plugins.ovirt_host_common.cockpit.packages.Plugin._closeup
2017-03-06 20:05:20 INFO otopi.plugins.ovirt_host_common.cockpit.packages packages._closeup:69 Starting cockpit
2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd systemd.state:130 starting service cockpit
2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:813 execute: ('/bin/systemctl', 'start', 'cockpit.service'), executable='None', cwd='None', env=None
2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.executeRaw:863 execute-result: ('/bin/systemctl', 'start', 'cockpit.service'), rc=5
2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:921 execute-output: ('/bin/systemctl', 'start', 'cockpit.service') stdout:


2017-03-06 20:05:20 DEBUG otopi.plugins.otopi.services.systemd plugin.execute:926 execute-output: ('/bin/systemctl', 'start', 'cockpit.service') stderr:
Failed to start cockpit.service: Unit not found.

2017-03-06 20:05:20 DEBUG otopi.context context._executeMethod:142 method exception
Traceback (most recent call last):
  File "/tmp/ovirt-fiObiWABAk/pythonlib/otopi/context.py", line 132, in _executeMethod
    method['method']()
  File "/tmp/ovirt-fiObiWABAk/otopi-plugins/ovirt-host-common/cockpit/packages.py", line 70, in _closeup
    self.services.state('cockpit', True)
  File "/tmp/ovirt-fiObiWABAk/otopi-plugins/otopi/services/systemd.py", line 141, in state
    service=name,
RuntimeError: Failed to start service 'cockpit'
2017-03-06 20:05:20 ERROR otopi.context context._executeMethod:151 Failed to execute stage 'Closing up': Failed to start service 'cockpit'
2017-03-06 20:05:20 DEBUG otopi.context context.dumpEnvironment:760 ENVIRONMENT DUMP - BEGIN
2017-03-06 20:05:20 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/error=bool:'True'
2017-03-06 20:05:20 DEBUG otopi.context context.dumpEnvironment:770 ENV BASE/exceptionInfo=list:'[(<type 'exceptions.RuntimeError'>, RuntimeError("Failed to start service 'cockpit'",), <traceback object at 0x1d0a758>)]'

Comment 17 Simone Tiraboschi 2017-03-07 10:10:49 UTC

What has been reported on https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c10 are just the symptoms of BZ#1429855, making this dependent on that and moving back to ON_QA

Comment 18 Nikolai Sednev 2017-03-07 10:42:35 UTC

Its interesting, but although I've had cockpit-ovirt-dashboard-0.10.7-0.0.11.el7ev.noarch on host, the backup still failed on engine during HE deployment.

Comment 19 Nikolai Sednev 2017-03-07 14:45:36 UTC

I've tried on the same host again, but this time I've manually removed ovirt-hosted-engine-setup, vdsm, libvirt, sanlock, qemu, cockpit-ovirt-dashboard, then restarted host, then installed ovirt-hosted-engine-setup, saw that cockpit-ovirt-dashboard was not installed on host and then installed it manually too, deployed hosted-engine on clean storage, then updated engine to latest bits during deployment and then copied backup files to engine and then successfully performed restore on engine successfully, then ran engine-setup and then finished hosted engine deployment:

On engine:


nsednev-he-1 ~]# engine-backup --mode=restore --log=/root/Log_nsednev_from_alma04_rhevm_4_1_1 --file=/root/nsednev_from_alma04_rhevm_4_1_1 --provision-db --provision-dwh-db --provision-reports-db --restore-permissions  --he-remove-hosts --he-remove-storage-vm
Preparing to restore:
- Unpacking file '/root/nsednev_from_alma04_rhevm_4_1_1'
Restoring:
- Files
Provisioning PostgreSQL users/databases:
- user 'engine', database 'engine'
- user 'ovirt_engine_history', database 'ovirt_engine_history'
Restoring:
- Engine database 'engine'
  - Cleaning up temporary tables in engine database 'engine'
  - Updating DbJustRestored VdcOption in engine database
  - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database
------------------------------------------------------------------------------
Please note:

The engine database was backed up at 2017-03-06 18:30:38.000000000 +0200 .

Objects that were added, removed or changed after this date, such as virtual
machines, disks, etc., are missing in the engine, and will probably require
recovery or recreation.
------------------------------------------------------------------------------
  - Removing the hosted-engine storage domain, all its entities and the hosted-engine VM.
  - Removing all the hosted-engine hosts.
- DWH database 'ovirt_engine_history'
You should now run engine-setup.
Done.
[root@nsednev-he-1 ~]# engine-setup 
[ INFO  ] Stage: Initializing
[ INFO  ] Stage: Environment setup
          Configuration files: ['/etc/ovirt-engine-setup.conf.d/10-packaging-wsp.conf', '/etc/ovirt-engine-setup.conf.d/10-packaging.conf', '/etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf']
          Log file: /var/log/ovirt-engine/setup/ovirt-engine-setup-20170307162459-exl44t.log
          Version: otopi-1.6.0 (otopi-1.6.0-1.el7ev)
[ INFO  ] The engine DB has been restored from a backup
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ INFO  ] Stage: Environment customization
         
          Welcome to the RHEV 4.1 setup/upgrade.
          Please read the RHEV 4.1 install guide
          https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Virtualization/4.1/html/Installation_Guide/index.html.
          Please refer to the RHEV Upgrade Helper application
          https://access.redhat.com/labs/rhevupgradehelper/ which will guide you in the upgrading process.
          Would you like to proceed? (Yes, No) [Yes]: 
         
          --== PRODUCT OPTIONS ==--
         
         
          --== PACKAGES ==--
         
[ INFO  ] Checking for product updates...
[ INFO  ] No product updates found
         
          --== NETWORK CONFIGURATION ==--
         
          Setup can automatically configure the firewall on this system.
          Note: automatic configuration of the firewall may overwrite current settings.
          Do you want Setup to configure the firewall? (Yes, No) [Yes]: 
[ INFO  ] firewalld will be configured as firewall manager.
         
          --== DATABASE CONFIGURATION ==--
         
          The detected DWH database size is 22 MB.
          Setup can backup the existing database. The time and space required for the database backup depend on its size. This process takes time, and in some cases (for instance, when the size is few GBs) may take several hours to complete.
          If you choose to not back up the database, and Setup later fails for some reason, it will not be able to restore the database and all DWH data will be lost.
          Would you like to backup the existing database before upgrading it? (Yes, No) [Yes]: 
         
          Found the following problems in PostgreSQL configuration for the Engine database:
           autovacuum_vacuum_scale_factor required to be at most 0.01
           autovacuum_analyze_scale_factor required to be at most 0.075
           autovacuum_max_workers required to be at least 6
           maintenance_work_mem required to be at least 65536
          Please set:
           autovacuum_vacuum_scale_factor = 0.01
           autovacuum_analyze_scale_factor = 0.075
           autovacuum_max_workers = 6
           maintenance_work_mem = 65536
          in postgresql.conf on 'localhost'. Its location is usually /var/lib/pgsql/data , or somewhere under /etc/postgresql* .
          The database requires these configurations values to be changed. Setup can fix them for you or abort. Fix automatically? (Yes, No) [Yes]:
         
          --== OVIRT ENGINE CONFIGURATION ==--
         
          Perform full vacuum on the engine database engine@localhost?
          This operation may take a while depending on this setup health and the
          configuration of the db vacuum process.
          See https://www.postgresql.org/docs/9.0/static/sql-vacuum.html
          (Yes, No) [No]: 
         
          --== STORAGE CONFIGURATION ==--
         
         
          --== PKI CONFIGURATION ==--
         
         
          --== APACHE CONFIGURATION ==--
         
         
          --== SYSTEM CONFIGURATION ==--
         
         
          --== MISC CONFIGURATION ==--
         
         
          --== END OF CONFIGURATION ==--
         
[ INFO  ] Stage: Setup validation
[ INFO  ] Cleaning stale zombie tasks and commands
         
          --== CONFIGURATION PREVIEW ==--
         
          Default SAN wipe after delete           : False
          Firewall manager                        : firewalld
          Update Firewall                         : True
          Host FQDN                               : nsednev-he-1.qa.lab.tlv.redhat.com
          Engine database secured connection      : False
          Engine database user name               : engine
          Engine database name                    : engine
          Engine database host                    : localhost
          Engine database port                    : 5432
          Engine database host name validation    : False
          Engine installation                     : True
          PKI organization                        : qa.lab.tlv.redhat.com
          DWH installation                        : True
          DWH database secured connection         : False
          DWH database host                       : localhost
          DWH database user name                  : ovirt_engine_history
          DWH database name                       : ovirt_engine_history
          Backup DWH database                     : True
          DWH database port                       : 5432
          DWH database host name validation       : False
          Configure Image I/O Proxy               : True
          Configure VMConsole Proxy               : True
          Configure WebSocket Proxy               : True
         
          Please confirm installation settings (OK, Cancel) [OK]: 
[ INFO  ] Cleaning async tasks and compensations
[ INFO  ] Unlocking existing entities
[ INFO  ] Checking the Engine database consistency
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stopping engine service
[ INFO  ] Stopping ovirt-fence-kdump-listener service
[ INFO  ] Stopping dwh service
[ INFO  ] Stopping Image I/O Proxy service
[ INFO  ] Stopping vmconsole-proxy service
[ INFO  ] Stopping websocket-proxy service
[ INFO  ] Stage: Misc configuration
[ INFO  ] Updating PostgreSQL configuration
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Upgrading CA
[ INFO  ] Backing up database localhost:engine to '/var/lib/ovirt-engine/backups/engine-20170307162553.XmWk5v.dump'.
[ INFO  ] Creating/refreshing Engine database schema
[ INFO  ] Backing up database localhost:ovirt_engine_history to '/var/lib/ovirt-engine-dwh/backups/dwh-20170307162610.LW_hQY.dump'.
[ INFO  ] Creating/refreshing DWH database schema
[ INFO  ] Configuring Image I/O Proxy
[ INFO  ] Configuring WebSocket Proxy
[ INFO  ] Creating/refreshing Engine 'internal' domain database schema
[ INFO  ] Generating post install configuration file '/etc/ovirt-engine-setup.conf.d/20-setup-ovirt-post.conf'
[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
[ INFO  ] Starting engine service
[ INFO  ] Starting dwh service
[ INFO  ] Restarting ovirt-vmconsole proxy service
         
          --== SUMMARY ==--
         
[ INFO  ] Restarting httpd
          Web access is enabled at:
              http://nsednev-he-1.qa.lab.tlv.redhat.com:80/ovirt-engine
              https://nsednev-he-1.qa.lab.tlv.redhat.com:443/ovirt-engine
          Internal CA DD:B5:1A:3D:D4:D8:60:79:4F:70:D4:4E:47:65:A0:D1:AA:85:00:08
          SSH fingerprint: 3b:66:3c:e1:54:c1:6e:af:f0:8d:c4:f3:29:44:66:a9
         
          --== END OF SUMMARY ==--
         
[ INFO  ] Stage: Clean up
          Log file is located at /var/log/ovirt-engine/setup/ovirt-engine-setup-20170307162459-exl44t.log
[ INFO  ] Generating answer file '/var/lib/ovirt-engine/setup/answers/20170307162637-setup.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ INFO  ] Execution of setup completed successfully




On host:
[root@alma04 ~]# hosted-engine --deploy
[ INFO  ] Stage: Initializing
[ INFO  ] Generating a temporary VNC password.
[ INFO  ] Stage: Environment setup
          During customization use CTRL-D to abort.
          Continuing will configure this host for serving as hypervisor and create a VM where you have to install the engine afterwards.
          Are you sure you want to continue? (Yes, No)[Yes]: 
          It has been detected that this program is executed through an SSH connection without using screen.
          Continuing with the installation may lead to broken installation if the network connection fails.
          It is highly recommended to abort the installation and run it inside a screen session using command "screen".
          Do you want to continue anyway? (Yes, No)[No]: yes
[ INFO  ] Hardware supports virtualization
          Configuration files: []
          Log file: /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20170307154034-82ops6.log
          Version: otopi-1.6.0 (otopi-1.6.0-1.el7ev)
[ INFO  ] Detecting available oVirt engine appliances
[ ERROR ] No engine appliance image is available on your system.
          The oVirt engine appliance is now required to deploy hosted-engine.
          You could get oVirt engine appliance installing ovirt-engine-appliance rpm.
          Do you want to install ovirt-engine-appliance rpm? (Yes, No) [Yes]: 
[ INFO  ] Stage: Environment packages setup
[ INFO  ] Installing the oVirt engine appliance
[ INFO  ] Yum Status: Downloading Packages
[ INFO  ] Yum Downloading: rhvm-appliance-4.1.20170221.0-1.el7ev.noarch.rpm 971 M(61%)
[ INFO  ] Yum Download/Verify: 1:rhvm-appliance-4.1.20170221.0-1.el7ev.noarch
[ INFO  ] Yum Status: Check Package Signatures
[ INFO  ] Yum Status: Running Test Transaction
[ INFO  ] Yum Status: Running Transaction
[ INFO  ] Yum install: 1/1: 1:rhvm-appliance-4.1.20170221.0-1.el7ev.noarch
[ INFO  ] Yum Verify: 1/1: rhvm-appliance.noarch 1:4.1.20170221.0-1.el7ev - u
[ INFO  ] Stage: Programs detection
[ INFO  ] Stage: Environment setup
[ INFO  ] Stage: Environment customization
         
          --== STORAGE CONFIGURATION ==--
         
          Please specify the storage you would like to use (glusterfs, iscsi, fc, nfs3, nfs4)[nfs3]: 
          Please specify the full shared storage connection path to use (example: host:/path): yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1
         
          --== HOST NETWORK CONFIGURATION ==--
         
[ INFO  ] Bridge ovirtmgmt already created
          iptables was detected on your computer, do you wish setup to configure it? (Yes, No)[Yes]: 
          Please indicate a pingable gateway IP address [10.35.72.254]: 
         
          --== VM CONFIGURATION ==--
         
          The following appliance have been found on your system:
                [1] - The RHEV-M Appliance image (OVA) - 4.1.20170221.0-1.el7ev
                [2] - Directly select an OVA file
          Please select an appliance (1, 2) [1]: 
[ INFO  ] Verifying its sha1sum
[ INFO  ] Checking OVF archive content (could take a few minutes depending on archive size)
[ INFO  ] Checking OVF XML content (could take a few minutes depending on archive size)
          Please specify the console type you would like to use to connect to the VM (vnc, spice) [vnc]: 
[ INFO  ] Detecting host timezone.
          Would you like to use cloud-init to customize the appliance on the first boot (Yes, No)[Yes]? 
          Would you like to generate on-fly a cloud-init ISO image (of no-cloud type)
          or do you have an existing one (Generate, Existing)[Generate]? 
          Please provide the FQDN you would like to use for the engine appliance.
          Note: This will be the FQDN of the engine VM you are now going to launch,
          it should not point to the base host or to any other existing machine.
          Engine VM FQDN: (leave it empty to skip):  []: nsednev-he-1.qa.lab.tlv.redhat.com
          Please provide the domain name you would like to use for the engine appliance.
          Engine VM domain: [qa.lab.tlv.redhat.com]
          Automatically execute engine-setup on the engine appliance on first boot (Yes, No)[Yes]? no
          Enter root password that will be used for the engine appliance (leave it empty to skip): 
          Confirm appliance root password: 
          Enter ssh public key for the root user that will be used for the engine appliance (leave it empty to skip): 
[WARNING] Skipping appliance root ssh public key
          Do you want to enable ssh access for the root user (yes, no, without-password) [yes]: 
          Please specify the size of the VM disk in GB: [50]: 
          Please specify the memory size of the VM in MB (Defaults to appliance OVF value): [4096]: 16384
          The following CPU types are supported by this host:
                 - model_SandyBridge: Intel SandyBridge Family
                 - model_Westmere: Intel Westmere Family
                 - model_Nehalem: Intel Nehalem Family
                 - model_Penryn: Intel Penryn Family
                 - model_Conroe: Intel Conroe Family
          Please specify the CPU type to be used by the VM [model_SandyBridge]: 
          Please specify the number of virtual CPUs for the VM (Defaults to appliance OVF value): [2]: 4
          You may specify a unicast MAC address for the VM or accept a randomly generated default [00:16:3e:2b:b4:d4]: 00:16:3e:7b:b8:53
          How should the engine VM network be configured (DHCP, Static)[DHCP]? 
          Add lines for the appliance itself and for this host to /etc/hosts on the engine VM?
          Note: ensuring that this host could resolve the engine VM hostname is still up to you
          (Yes, No)[No] yes
         
          --== HOSTED ENGINE CONFIGURATION ==--
         
          Enter engine admin password: 
          Confirm engine admin password: 
          Please provide the name of the SMTP server through which we will send notifications [localhost]: 
          Please provide the TCP port number of the SMTP server [25]: 
          Please provide the email address from which notifications will be sent [root@localhost]: 
          Please provide a comma-separated list of email addresses which will get notifications [root@localhost]: 
[ INFO  ] Stage: Setup validation
         
          --== CONFIGURATION PREVIEW ==--
         
          Bridge interface                   : enp3s0f0
          Engine FQDN                        : nsednev-he-1.qa.lab.tlv.redhat.com
          Bridge name                        : ovirtmgmt
          Host address                       : alma04
          SSH daemon port                    : 22
          Firewall manager                   : iptables
          Gateway address                    : 10.35.72.254
          Storage Domain type                : nfs3
          Image size GB                      : 50
          Host ID                            : 1
          Storage connection                 : yellow-vdsb.qa.lab.tlv.redhat.com:/Compute_NFS/nsednev_he_1
          Console type                       : vnc
          Memory size MB                     : 16384
          MAC address                        : 00:16:3e:7b:b8:53
          Number of CPUs                     : 4
          OVF archive (for disk boot)        : /usr/share/ovirt-engine-appliance/rhvm-appliance-4.1.20170221.0-1.el7ev.ova
          Appliance version                  : 4.1.20170221.0-1.el7ev
          Engine VM timezone                 : Asia/Jerusalem
          CPU Type                           : model_SandyBridge
         
          Please confirm installation settings (Yes, No)[Yes]: 
[ INFO  ] Stage: Transaction setup
[ INFO  ] Stage: Misc configuration
[ INFO  ] Stage: Package installation
[ INFO  ] Stage: Misc configuration
[ INFO  ] Configuring libvirt
[ INFO  ] Configuring VDSM
[ INFO  ] Starting vdsmd
[ INFO  ] Creating Storage Domain
[ INFO  ] Creating Storage Pool
[ INFO  ] Connecting Storage Pool
[ INFO  ] Verifying sanlock lockspace initialization
[ INFO  ] Creating Image for 'hosted-engine.lockspace' ...
[ INFO  ] Image for 'hosted-engine.lockspace' created successfully
[ INFO  ] Creating Image for 'hosted-engine.metadata' ...
[ INFO  ] Image for 'hosted-engine.metadata' created successfully
[ INFO  ] Creating VM Image
[ INFO  ] Extracting disk image from OVF archive (could take a few minutes depending on archive size)
[ INFO  ] Validating pre-allocated volume size
[ INFO  ] Uploading volume to data domain (could take a few minutes depending on archive size)
[ INFO  ] Image successfully imported from OVF
[ INFO  ] Destroying Storage Pool
[ INFO  ] Start monitoring domain
[ INFO  ] Configuring VM
[ INFO  ] Updating hosted-engine configuration
[ INFO  ] Stage: Transaction commit
[ INFO  ] Stage: Closing up
[ INFO  ] Creating VM
          You can now connect to the VM with the following command:
                hosted-engine --console
          You can also graphically connect to the VM from your system with the following command:
                remote-viewer vnc://alma04.qa.lab.tlv.redhat.com:5900
          Use temporary password "8716nUeL" to connect to vnc console.
          Please ensure that your Guest OS is properly configured to support serial console according to your distro documentation.
          Follow http://www.ovirt.org/Serial_Console_Setup#I_need_to_access_the_console_the_old_way for more info.
          If you need to reboot the VM you will need to start it manually using the command:
          hosted-engine --vm-start
          You can then set a temporary password using the command:
          hosted-engine --add-console-password
          Please install and setup the engine in the VM.
          You may also be interested in installing ovirt-guest-agent-common package in the VM.
         
         
          The VM has been rebooted.
          To continue please install oVirt-Engine in the VM
          (Follow http://www.ovirt.org/Quick_Start_Guide for more info).
         
          Make a selection from the options below:
          (1) Continue setup - oVirt-Engine installation is ready and ovirt-engine service is up
          (2) Abort setup
          (3) Power off and restart the VM
          (4) Destroy VM and abort setup
         
          (1, 2, 3, 4)[1]: 
         
          Checking for oVirt-Engine status at nsednev-he-1.qa.lab.tlv.redhat.com...
[ INFO  ] Engine replied: DB Up!Welcome to Health Status!
[ INFO  ] Acquiring internal CA cert from the engine
[ INFO  ] The following CA certificate is going to be used, please immediately interrupt if not correct:
[ INFO  ] Issuer: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-1.qa.lab.tlv.redhat.com.20881, Subject: C=US, O=qa.lab.tlv.redhat.com, CN=nsednev-he-1.qa.lab.tlv.redhat.com.20881, Fingerprint (SHA-1): DDB51A3DD4D860794F70D44E4765A0D1AA850008
[ INFO  ] Connecting to the Engine
          Enter the name of the cluster to which you want to add the host (Default, regular_hosts_cluster) [Default]: 
[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] The VDSM Host is now operational
[ INFO  ] Saving hosted-engine configuration on the shared storage domain
          Please shutdown the VM allowing the system to launch it as a monitored service.
          The system will wait until the VM is down.
[ INFO  ] Enabling and starting HA services
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/var/lib/ovirt-hosted-engine-setup/answers/answers-20170307162906.conf'
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
[ INFO  ] Hosted Engine successfully deployed
[root@alma04 ~]# 




After engine db was restored, I've logged in to WEBUI of the engine and saw that data center was changing its status from contending to non responsive in rounds:
VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM

	
Mar 7, 2017 4:44:47 PM
	
	
Invalid status on Data Center Default. Setting status to Non Responsive.
	
Mar 7, 2017 4:44:44 PM
	
	
VDSM puma18.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock
	
Mar 7, 2017 4:44:39 PM
	
	
VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM
	
Mar 7, 2017 4:44:29 PM
	
	
Invalid status on Data Center Default. Setting status to Non Responsive.
	
Mar 7, 2017 4:44:27 PM
	
	
VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id
	
Mar 7, 2017 4:42:37 PM
	
	
VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock
	
Mar 7, 2017 4:42:31 PM
	
	
Invalid status on Data Center Default. Setting status to Non Responsive.
	
Mar 7, 2017 4:42:29 PM
	
	
VDSM puma18.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock
	
Mar 7, 2017 4:42:26 PM
	
	
VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM
	
Mar 7, 2017 4:42:15 PM
	
	
Invalid status on Data Center Default. Setting status to Non Responsive.
	
Mar 7, 2017 4:42:13 PM
	
	
VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock
	
Mar 7, 2017 4:42:12 PM
	
	
VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM
	
Mar 7, 2017 4:42:09 PM
	
	
Invalid status on Data Center Default. Setting status to Non Responsive.
	
Mar 7, 2017 4:42:07 PM
	
	
VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id
	
Mar 7, 2017 4:40:05 PM
	
	
Invalid status on Data Center Default. Setting status to Non Responsive.
	
Mar 7, 2017 4:40:03 PM
	
	
VDSM puma18.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock
	
Mar 7, 2017 4:39:58 PM
	
	
VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM

Comment 21 Nikolai Sednev 2017-03-07 16:06:03 UTC

Forth to https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c19, I've powered-off alma03, which probably stayed as SPM and did not released the sanlock lock, then after few minutes alma04 got SPM and auto-import of hosted-storage finished successfully. 

Then I've added alma03 back to engine as hosted-engine-host.

I had to sacrifice the VMs on alma03 when I've powered-off alma03 to finish this restore flow and once added alma03, those vms were not restored, comparing to vms on regular hosts which remained as before the restore.

Comment 22 Nikolai Sednev 2017-03-09 09:40:13 UTC

Blocked by 1417518.

Comment 25 Nikolai Sednev 2017-04-05 13:08:58 UTC

Created attachment 1268952 [details]
hosts screenshot from UI

Comment 26 Nikolai Sednev 2017-04-05 13:09:56 UTC

Created attachment 1268954 [details]
VMs screenshot from UI

Comment 27 Nikolai Sednev 2017-04-05 13:10:47 UTC

Created attachment 1268956 [details]
Storage screenshot from UI

Comment 28 Nikolai Sednev 2017-04-05 15:53:15 UTC

1)Performed these steps first:
Scenario setup:
- Deploy hosted-engine on a couple of hosts;
- also add an host not involved in hosted-engine
- add a regular storage domain,
- add a couple of VMs
- take a backup of the engine with engine-backup

2)Copied backup from engine to puma18, which was running 14 guest VMs and HE-VM and it also was an SPM.
3)Cleaned the HE-VM's storage by erasing everything from it and then reprovisioned puma19 to fresh RHEL7.3 and installed ovirt-hosted-engine package on it and also latest appliance, you may see it's version as appears bellow.
4)Copied backup files from the puma18 (its still an SPM host) to puma19.
5)Stated deployment of hosted-engine on puma19 and followed these steps:
- start hosted-engine-setup on puma19
- point to the same storage (which is clean now)
- respond no to: 'Automatically execute engine-setup on the engine appliance on first boot'
- copy the backup of the engine DB to the engine VM
- connect to the engine VM and execute engine-backup to restore the backup appending --he-remove-hosts option
- execute engine-setup
- come back to hosted-engine-setup and terminate the deployment

At the end only the host where you run hosted-engine --deploy should be there as an hosted-engine host; other hosts (non involved in HE) should be there as well
6)Deployment was successfully accomplished and I saw in engine that only puma19 was added to it and there also was alma03 none hosted-engine host with 6 regular VMs, then I've powered-off the HE-VM in order to finish the hosted-engine deployment.
7)HA agent started HE-VM automatically on puma19 and I've logged in to WEBUI.
8)I've seen both data storage domains in inactive state, so tried to activate at least one of them, so I could get hosted-engine's storage domain autoimported (it was cleared during restore).
7)puma19 host was shown in contending status as puma18 was still running as SPM and prevented puma19 from becoming an SPM, hence it could not get SPM and activate at least one data storage domain, this prevented it also from getting hosted-storage-domain from being auto-imported, see the 1417518.

Logs from UI:
Apr 5, 2017 6:49:35 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM

Apr 5, 2017 6:49:23 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Apr 5, 2017 6:49:23 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id

Apr 5, 2017 6:47:18 PM

VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock

Apr 5, 2017 6:47:14 PM

VDSM alma04.qa.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM

Apr 5, 2017 6:47:14 PM

Failed to activate Storage Domain nsednev_he_4_data_sd_1 (Data Center Default) by admin@internal-authz

Apr 5, 2017 6:47:14 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Apr 5, 2017 6:47:12 PM

VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock

Apr 5, 2017 6:47:09 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM

Apr 5, 2017 6:47:04 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Apr 5, 2017 6:47:01 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id

Apr 5, 2017 6:44:38 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Apr 5, 2017 6:44:36 PM

VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock

Apr 5, 2017 6:44:33 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM

Apr 5, 2017 6:44:31 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Apr 5, 2017 6:44:31 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id

Apr 5, 2017 6:42:25 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetAllTasksStatusesVDS failed: Not SPM

Apr 5, 2017 6:42:11 PM

VDSM puma19.scl.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot acquire host id

Apr 5, 2017 6:41:33 PM

The Hosted Engine Storage Domain doesn't exist. It will be imported automatically upon data center activation, which requires adding an initial storage domain to the data center.

Apr 5, 2017 6:41:33 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Apr 5, 2017 6:41:32 PM

VDSM alma04.qa.lab.tlv.redhat.com command HSMGetTaskStatusVDS failed: Cannot obtain lock

Apr 5, 2017 6:41:27 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Apr 5, 2017 6:41:18 PM

Affinity Rules Enforcement Manager started.

Apr 5, 2017 6:40:55 PM

ETL Service Started

Apr 5, 2017 6:38:07 PM

ETL Service Stopped

Apr 5, 2017 6:37:30 PM

Invalid status on Data Center Default. Setting status to Non Responsive.

Comment 29 Nikolai Sednev 2017-04-05 15:53:46 UTC

Created attachment 1269019 [details]
Screenshot from 2017-04-05 18-38-30.png

Comment 30 Nikolai Sednev 2017-04-05 15:54:57 UTC

Created attachment 1269020 [details]
can't obtain lock and become an SPM because of puma18 is holding it

Comment 31 Nikolai Sednev 2017-04-05 15:58:50 UTC

I can't verify this RFE until https://bugzilla.redhat.com/show_bug.cgi?id=1417518#c9 issue is fixed, which is being reproduced exactly here in https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28.

Comment 32 Nikolai Sednev 2017-04-05 16:03:09 UTC

Created attachment 1269023 [details]
sosreport-nsednev-he-4.scl.lab.tlv.redhat.com-20170405185534.tar.xz

Comment 33 Nikolai Sednev 2017-04-05 16:04:07 UTC

Created attachment 1269024 [details]
sosreport-puma19.scl.lab.tlv.redhat.com-20170405185518.tar.xz

Comment 34 Nikolai Sednev 2017-04-05 16:07:35 UTC

Components on engine:
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-doc-4.1.0-3.el7ev.noarch
rhev-guest-tools-iso-4.1-5.el7ev.noarch
rhevm-setup-plugins-4.1.1-1.el7ev.noarch
rhevm-4.1.1.7-0.1.el7.noarch
rhevm-dependencies-4.1.1-1.el7ev.noarch
Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017
Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Components on host (puma19):
rhvm-appliance-4.1.20170221.0-1.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch
ovirt-host-deploy-1.6.3-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.9-1.el7ev.noarch
vdsm-4.19.10.1-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.0.5-1.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017
Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Comment 35 Denis Chaplygin 2017-04-06 07:53:01 UTC

Have you seen message '  - Please redeploy already existing HE hosts IMMEDIATELY after restore, to avoid possible SPM deadlocks.'? 

In case of positive answer, you did your next steps wrong. In https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28 you wrote: 

6)Deployment was successfully accomplished and I saw in engine that only puma19 was added to it and there also was alma03 none hosted-engine host with 6 regular VMs, then I've powered-off the HE-VM in order to finish the hosted-engine deployment.


But, as warning message told you, you are supposed to redeploy alma03 right after restore procedure.

Comment 36 Nikolai Sednev 2017-04-06 08:17:56 UTC

(In reply to Denis Chaplygin from comment #35)
> Have you seen message '  - Please redeploy already existing HE hosts
> IMMEDIATELY after restore, to avoid possible SPM deadlocks.'? 
> 
> In case of positive answer, you did your next steps wrong. In
> https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28 you wrote: 
> 
> 6)Deployment was successfully accomplished and I saw in engine that only
> puma19 was added to it and there also was alma03 none hosted-engine host
> with 6 regular VMs, then I've powered-off the HE-VM in order to finish the
> hosted-engine deployment.
> 
> 
> But, as warning message told you, you are supposed to redeploy alma03 right
> after restore procedure.

No, I did not seen that message, where is it expected to appear?
If customer like in my scenario have 3 hosts, one regular host alma03, then two hosted-engine-hosts puma18&19 and they're all running number of VMs that only one host might be placed to redeployment at a time, to avoid VM's shutdown, then you simply can't redeploy all hosted-engine-hosts without loosing VMs.

Comment 37 Denis Chaplygin 2017-04-06 09:37:28 UTC

Answered in BZ1417518

Comment 38 Nikolai Sednev 2017-04-09 18:51:11 UTC

Moving to verified forth to latest reproduction, while following these steps:
1)Deployed clean HE environment over 2 hosted engine hosts (puma18 and puma19) using NFS storage domain for HE.
2)Added 2 data NFS storage domains.
3)Got HE's storage domain auto-imported.
4)Added one regular host alma04.
5)Created 20 guest-VMs.
6)Migrated 19 guest VMs to alma04 and left one guest VM on puma19.
7)Made alma04 (regular host) as SPM.
8)HE-VM running on puma18.
9)Backed up HE-VM's db and copied it to puma19.
10)Wiped out HE's storage domain e.g. "rm -rf /mnt/nsednev_he_4/*".
11)Reprovisioned puma18 to clean and fresh RHEL7.3.
12)Added 4.1 latest repos to puma18.
13)Installed rhvm-appliance-4.1.20170403.0-1.el7.noarch on puma18.
14)Started deployment of hosted-engine on puma18.
15)During deployment made a restore of HE-db copied from puma19 to the engine and updated engine to latest bits, then ran engine-setup.
 
nsednev-he-4 ~]# engine-backup --mode=restore --log=/root/Log_nsednev --file=/root/nsednev --provision-db --provision-dwh-db --provision-reports-db --restore-permissions  --he-remove-hosts --he-remove-storage-vm
Preparing to restore:
- Unpacking file '/root/nsednev'
Restoring:
- Files
Provisioning PostgreSQL users/databases:
- user 'engine', database 'engine'
- user 'ovirt_engine_history', database 'ovirt_engine_history'
Restoring:
- Engine database 'engine'
  - Cleaning up temporary tables in engine database 'engine'
  - Updating DbJustRestored VdcOption in engine database
  - Resetting DwhCurrentlyRunning in dwh_history_timekeeping in engine database
------------------------------------------------------------------------------
Please note:

The engine database was backed up at 2017-04-09 11:57:55.000000000 -0400 .

Objects that were added, removed or changed after this date, such as virtual
machines, disks, etc., are missing in the engine, and will probably require
recovery or recreation.
------------------------------------------------------------------------------
  - Removing the hosted-engine storage domain, all its entities and the hosted-engine VM.
  - Removing all the hosted-engine hosts.
  - Please redeploy already existing HE hosts IMMEDIATELY after restore, to avoid possible SPM deadlocks.
- DWH database 'ovirt_engine_history'
You should now run engine-setup.
Done.

16)Finished the deployment on puma18 and got HE environment without puma19 as expected.
17)Added puma19 to restored environment and single guest VM was running on puma19.
18)Regular host alma04 was already in UI as was restored as expected and 19 guest VMs were running on it.

Works for me on these components on hosts:
rhvm-appliance-4.1.20170403.0-1.el7.noarch
libvirt-client-2.0.0-10.el7_3.5.x86_64
ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch
ovirt-host-deploy-1.6.3-1.el7ev.noarch
ovirt-imageio-common-1.0.0-0.el7ev.noarch
ovirt-vmconsole-host-1.0.4-1.el7ev.noarch
mom-0.5.9-1.el7ev.noarch
vdsm-4.19.10.1-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.0.5-1.el7ev.noarch
ovirt-setup-lib-1.1.0-1.el7ev.noarch
ovirt-engine-sdk-python-3.6.9.1-1.el7ev.noarch
ovirt-vmconsole-1.0.4-1.el7ev.noarch
qemu-kvm-rhev-2.6.0-28.el7_3.9.x86_64
ovirt-imageio-daemon-1.0.0-0.el7ev.noarch
sanlock-3.4.0-1.el7.x86_64
Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017
Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

Engine:
rhevm-doc-4.1.0-3.el7ev.noarch
rhev-guest-tools-iso-4.1-5.el7ev.noarch
rhevm-4.1.1.8-0.1.el7.noarch
rhevm-dependencies-4.1.1-1.el7ev.noarch
rhevm-branding-rhev-4.1.0-1.el7ev.noarch
rhevm-setup-plugins-4.1.1-1.el7ev.noarch
Linux version 3.10.0-514.16.1.el7.x86_64 (mockbuild.eng.bos.redhat.com) (gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 10 13:12:32 EST 2017
Linux 3.10.0-514.16.1.el7.x86_64 #1 SMP Fri Mar 10 13:12:32 EST 2017 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux Server release 7.3 (Maipo)

In case of hitting https://bugzilla.redhat.com/show_bug.cgi?id=1235200#c28, it is totally possible to add removed hosted-engine-host that is still acting as SPM, as regular host, then to set it to maintenance in UI, then to reinstall as hosted-engine-host, all this without loosing any of running guest-VMs on that host.
I've just verified that on my clean environment now and may confirm that it did worked just fine, puma19 had been added as regular host and then reinstalled as hosted-engine-host, alma04 taken SPM as soon as puma19 was added, puma19 received Host ID=3 instead of previously assigned Host ID=2.

Note You need to log in before you can comment on or make changes to this bug.