Description of problem: There are some host issues that make restoring a hosted engine environment less than ideal from a usability viewpoint. The following was experienced during testing for BZ#1232136 (Comments 76, 78, and 87 may be of particular relevance), which involved placing hosted_engine_2 (of two hosted-engine hosts) in maintenance during engine-backup, and using that host to deploy the restored engine (so that its namesake could be easily dropped out from the restored environment to allow it being added anew). 1) It takes ~10 minutes for the host to become operational in the engine (end of hosted-engine --deploy): [ INFO ] Engine replied: DB Up!Welcome to Health Status! [ INFO ] Waiting for the host to become operational in the engine. This may take several minutes... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... [ INFO ] Still waiting for VDSM host to become operational... 2) VDSM instead times out with error: [ INFO ] Still waiting for VDSM host to become operational... [ ERROR ] Timed out while waiting for host to start. Please check the logs. [ ERROR ] Unable to add hosted_engine_2 to the manager ((Though deployment is ultimately successful: Please shutdown the VM allowing the system to launch it as a monitored service. The system will wait until the VM is down. [ INFO ] Enabling and starting HA services Hosted Engine successfully set up [ INFO ] Stage: Clean up [ INFO ] Generating answer file '/etc/ovirt-hosted-engine/answers.conf' [ INFO ] Stage: Pre-termination [ INFO ] Stage: Termination )) 3) At end of deployment, the host is in 'Unassigned' state (issue with SPM contention?) 4) Host needs to be placed into maintenance mode before it can be activated. Placing host in maintenance mode took ~10 minutes, and the Admin Portal didn't alert me when this was finished (Host was in 'Preparing for Maintenance' state, but saw later that there was event in Events tab saying it was in Maintenance mode) 5) Removing hosted-engine host with SPM (hosted_engine_1) cannot be done from within the Admin Portal, but requires force-remove host POST request. 6) Adding hosted_engine_1 runs into the same ~10 minute delay and error: [ ERROR ] Timed out while waiting for host to start. Please check the logs. [ ERROR ] Unable to add hosted_engine_1 to the manager After which it can be manually activated from the Admin Portal. Version-Release number of selected component (if applicable): 3.4 and 3.5 (quite possibly 3.3 as well, however I have not tested this) How reproducible: 100% for me Steps to Reproduce: 1. The various procedures being used are documented here: https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Virtualization/3.4/html-single/Installation_Guide/index.html#sect-Backing_up_and_Restoring_a_Self-Hosted_Environment Expected results: 1, 2 and 6) VDSM doesn't time out with error when adding host to environment 3, 4, and 6) Added host should be 'Up' 4) Host shouldn't take ~10 mins to go into maintenance mode; portal should inform user when it's in maintenance 5) Removing host should be able to be done form Administration Portal
Created attachment 1050527 [details] hosted_engine_1 deployment log About lines 2612 is when the host attempts to be added to engine
Created attachment 1050528 [details] Additional host log Adds to engine at about line 2202
Please check how complex the fix is prior to pushing this to 3.6.
Meital, has the existing procedure been tested with 3.6?
Hi. I have repeated on 3.6 node2 hosted-engine --deploy: http://scr.keikogi.ru/jidckii/1460464893728.png http://scr.keikogi.ru/jidckii/1460464944999.png log: cat /var/log/ovirt-engine/engine.log http://paste2.org/kDZLEvhX tail -n 1000 /var/log/ovirt-engine/host-deploy/ovirt-host-deploy-20160412163333-node2.otv.loc-7302438e.log http://paste2.org/IOzbe2Ew
Hello, 1Where your hosts RHEL7.2s or RHEVHs? 2Where you performing deployment of HE on clean/freshly installed hosts? 3What is your work flow, are you trying to backup old HE and then restore it on new host or trying to migrate your engine from bare-metal to HE based environment?
Can you please attach the full logs from host that was added to the engine?
I do not speak English. Apologies for machine translation. I have 2 host and a new installation. as I am using OS centos7 Now all the logs from the catalog /var/log/ovirt-engine/host-deploy/ https://yadi.sk/d/PXu-YAJMqyrsd problem occurs when adding a new host to Default Cluster.
Moving to 3.6.7, as R&D did not handle it at all in 3.6.6 and PM did not ACK it either.
We need to provide a way to filter out all the hosted-engine reference from the restored DB. See also: https://bugzilla.redhat.com/show_bug.cgi?id=1240466#c21
(In reply to Simone Tiraboschi from comment #11) > We need to provide a way to filter out all the hosted-engine reference from > the restored DB. > > See also: https://bugzilla.redhat.com/show_bug.cgi?id=1240466#c21 So this is also in the case of switching the storage as well? Not just the HE.
(In reply to Yaniv Dary from comment #12) > (In reply to Simone Tiraboschi from comment #11) > > We need to provide a way to filter out all the hosted-engine reference from > > the restored DB. > > > > See also: https://bugzilla.redhat.com/show_bug.cgi?id=1240466#c21 > > So this is also in the case of switching the storage as well? Not just the > HE. No, in this specific case it's just because that specific host was already present in the engine since the engine DB was restored from a backup. Basically it's just a side effect of this one: https://bugzilla.redhat.com/show_bug.cgi?id=1065350 So in principle we can just fix this specific issue on hosted-engine-setup side detecting and avoid hitting it but it's going to open sub-cases for instance if the host is the same but the address host has been changed and so on. In general we can experiment a lot of similar issue when we restore a DB of the engine since the engine will assume that the external env is exactly as it was when the backup was taken: same hosts, same storage domains, same networks... In general the engine is robust enough to identify the missing/broken component and let the user fix the configuration. An hosted-engine env is a bit more delicate since we need also to ensure that the ha agent is able to correctly start the engine VM and this means that: - hosted-engine storage domain is coherent otherwise we cannot edit the engine VM anymore - engine VM uuid is coherent - hosted-engine host list is coherent
*** This bug has been marked as a duplicate of bug 1235200 ***