Bug 1241811 - Adding hosted-engine hosts to restored engine is messy
Summary: Adding hosted-engine hosts to restored engine is messy
Keywords:
Status: CLOSED DUPLICATE of bug 1235200
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-hosted-engine-setup
Version: 3.4.5
Hardware: Unspecified
OS: Unspecified
high
medium
Target Milestone: ovirt-4.1.0-alpha
: ---
Assignee: Simone Tiraboschi
QA Contact: Nikolai Sednev
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-07-10 07:04 UTC by Andrew Burden
Modified: 2016-09-29 12:45 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-09-29 12:45:09 UTC
oVirt Team: Integration
Target Upstream Version:
Embargoed:
nsednev: testing_plan_complete+


Attachments (Terms of Use)
hosted_engine_1 deployment log (416.65 KB, text/plain)
2015-07-10 07:06 UTC, Andrew Burden
no flags Details
Additional host log (354.29 KB, text/plain)
2015-07-10 07:09 UTC, Andrew Burden
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Bugzilla 1240466 0 high CLOSED Restoring self-hosted engine from backup has conflict between new and old HostedEngine VM 2021-02-22 00:41:40 UTC

Internal Links: 1240466

Description Andrew Burden 2015-07-10 07:04:34 UTC
Description of problem:
There are some host issues that make restoring a hosted engine environment less than ideal from a usability viewpoint. The following was experienced during testing for BZ#1232136 (Comments 76, 78, and 87 may be of particular relevance), which involved placing hosted_engine_2 (of two hosted-engine hosts) in maintenance during engine-backup, and using that host to deploy the restored engine (so that its namesake could be easily dropped out from the restored environment to allow it being added anew). 

1) It takes ~10 minutes for the host to become operational in the engine (end of hosted-engine --deploy):
[ INFO  ] Engine replied: DB Up!Welcome to Health Status!
[ INFO  ] Waiting for the host to become operational in the engine. This may take several minutes...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...
[ INFO  ] Still waiting for VDSM host to become operational...

2) VDSM instead times out with error:
[ INFO  ] Still waiting for VDSM host to become operational...
[ ERROR ] Timed out while waiting for host to start. Please check the logs.
[ ERROR ] Unable to add hosted_engine_2 to the manager

((Though deployment is ultimately successful:
          Please shutdown the VM allowing the system to launch it as a monitored service.
          The system will wait until the VM is down.
[ INFO  ] Enabling and starting HA services
          Hosted Engine successfully set up
[ INFO  ] Stage: Clean up
[ INFO  ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf'
[ INFO  ] Stage: Pre-termination
[ INFO  ] Stage: Termination
))

3) At end of deployment, the host is in 'Unassigned' state (issue with SPM contention?) 

4) Host needs to be placed into maintenance mode before it can be activated. Placing host in maintenance mode took ~10 minutes, and the Admin Portal didn't alert me when this was finished (Host was in 'Preparing for Maintenance' state, but saw later that there was event in Events tab saying it was in Maintenance mode)

5) Removing hosted-engine host with SPM (hosted_engine_1) cannot be done from within the Admin Portal, but requires force-remove host POST request.

6) Adding hosted_engine_1 runs into the same ~10 minute delay and error:
[ ERROR ] Timed out while waiting for host to start. Please check the logs.
[ ERROR ] Unable to add hosted_engine_1 to the manager
After which it can be manually activated from the Admin Portal.



Version-Release number of selected component (if applicable):
3.4 and 3.5 (quite possibly 3.3 as well, however I have not tested this)

How reproducible:
100% for me

Steps to Reproduce:
1. The various procedures being used are documented here: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.4/html-single/Installation_Guide/index.html#sect-Backing_up_and_Restoring_a_Self-Hosted_Environment 

Expected results:
1, 2 and 6) VDSM doesn't time out with error when adding host to environment
3, 4, and 6) Added host should be 'Up'
4) Host shouldn't take ~10 mins to go into maintenance mode; portal should inform user when it's in maintenance
5) Removing host should be able to be done form Administration Portal

Comment 2 Andrew Burden 2015-07-10 07:06:48 UTC
Created attachment 1050527 [details]
hosted_engine_1 deployment log

About lines 2612 is when the host attempts to be added to engine

Comment 3 Andrew Burden 2015-07-10 07:09:00 UTC
Created attachment 1050528 [details]
Additional host log

Adds to engine at about line 2202

Comment 4 Yaniv Lavi 2015-11-02 14:51:49 UTC
Please check how complex the fix is prior to pushing this to 3.6.

Comment 5 Sandro Bonazzola 2016-04-11 09:41:43 UTC
Meital, has the existing procedure been tested with 3.6?

Comment 6 jidckii 2016-04-13 09:41:40 UTC
Hi.
I have repeated on 3.6

node2 hosted-engine --deploy:

http://scr.keikogi.ru/jidckii/1460464893728.png
http://scr.keikogi.ru/jidckii/1460464944999.png

log:

cat /var/log/ovirt-engine/engine.log
http://paste2.org/kDZLEvhX

tail -n 1000 /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20160412163333-node2.otv.loc-7302438e.log
http://paste2.org/IOzbe2Ew

Comment 7 Nikolai Sednev 2016-04-13 13:07:35 UTC
Hello,
1Where your hosts RHEL7.2s or RHEVHs?
2Where you performing deployment of HE on clean/freshly installed hosts?
3What is your work flow, are you trying to backup old HE and then restore it on new host or trying to migrate your engine from bare-metal to HE based environment?

Comment 8 Nikolai Sednev 2016-04-14 08:36:23 UTC
Can you please attach the full logs from host that was added to the engine?

Comment 9 jidckii 2016-04-14 10:40:37 UTC
I do not speak English.
Apologies for machine translation.
I have 2 host and a new installation.
as I am using OS centos7
Now all the logs from the catalog
/var/log/ovirt-engine/host-deploy/
https://yadi.sk/d/PXu-YAJMqyrsd
problem occurs when adding a new host to Default Cluster.

Comment 10 Yaniv Kaul 2016-05-04 18:12:22 UTC
Moving to 3.6.7, as R&D did not handle it at all in 3.6.6 and PM did not ACK it either.

Comment 11 Simone Tiraboschi 2016-06-14 10:26:11 UTC
We need to provide a way to filter out all the hosted-engine reference from the restored DB.

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1240466#c21

Comment 12 Yaniv Lavi 2016-06-14 15:22:33 UTC
(In reply to Simone Tiraboschi from comment #11)
> We need to provide a way to filter out all the hosted-engine reference from
> the restored DB.
> 
> See also: https://bugzilla.redhat.com/show_bug.cgi?id=1240466#c21

So this is also in the case of switching the storage as well? Not just the HE.

Comment 13 Simone Tiraboschi 2016-06-14 16:09:05 UTC
(In reply to Yaniv Dary from comment #12)
> (In reply to Simone Tiraboschi from comment #11)
> > We need to provide a way to filter out all the hosted-engine reference from
> > the restored DB.
> > 
> > See also: https://bugzilla.redhat.com/show_bug.cgi?id=1240466#c21
> 
> So this is also in the case of switching the storage as well? Not just the
> HE.

No, in this specific case it's just because that specific host was already present in the engine since the engine DB was restored from a backup.
Basically it's just a side effect of this one:
https://bugzilla.redhat.com/show_bug.cgi?id=1065350
So in principle we can just fix this specific issue on hosted-engine-setup side detecting and avoid hitting it but it's going to open sub-cases for instance if the host is the same but the address host has been changed and so on.

In general we can experiment a lot of similar issue when we restore a DB of the engine since the engine will assume that the external env is exactly as it was when the backup was taken: same hosts, same storage domains, same networks...
In general the engine is robust enough to identify the missing/broken component and let the user fix the configuration.

An hosted-engine env is a bit more delicate since we need also to ensure that the ha agent is able to correctly start the engine VM and this means that:
- hosted-engine storage domain is coherent otherwise we cannot edit the engine VM anymore
- engine VM uuid is coherent
- hosted-engine host list is coherent

Comment 14 Simone Tiraboschi 2016-09-29 12:45:09 UTC

*** This bug has been marked as a duplicate of bug 1235200 ***


Note You need to log in before you can comment on or make changes to this bug.