Bug 1640097

Summary: Restoring an HE env from backup fails if the power management was configured for HE hosts
Product: [oVirt] ovirt-hosted-engine-setup Reporter: Polina <pagranat>
Component: GeneralAssignee: Simone Tiraboschi <stirabos>
Status: CLOSED CURRENTRELEASE QA Contact: Polina <pagranat>
Severity: high Docs Contact:
Priority: unspecified    
Version: 2.2.24CC: bugs, matonb, stirabos
Target Milestone: ovirt-4.2.8Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-hosted-engine-setup-2.2.29-1.el7ev.noarch.rpm Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-01-22 10:23:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Polina 2018-10-17 11:19:50 UTC
Description of problem:Restoring an hosted-engine environment from backup fails if the power management was configured for HE hosts

Version-Release number of selected component (if applicable):
Restoring an hosted-engine environment from backup fails if the power management was configured for HE hosts

How reproducible:100

scenario from doc https://docs.google.com/document/d/1Hyg7epVNfwSmPx9N8qaITH5vo2mGQm6Ie1JKYbFYBus/edit?ts=5bbcbe3e: 
Node 0 -> node 0
nfs->nfs 
redeploy on an env where power management is configured and all the hosts could be reached

The 4.2 upstream HE environment has two hosts - host1 - not HE, host2 - HE host. The VM1 is running on not HE host1, VM2 is running on HE host2. Power management is configured on both hosts. 

Steps to reproduce:
1. The backup file is created on engine by running <engine-backup --mode=backup --file=backup_compute-he-4 --log=log_compute-he-4_backup4.2>. Copy the backup file aside (on laptop) .
2. Insert environment into global maintenance. 
3. Cleanup HE Storage NFS Domain.
4. Reprovisioning HE host . Copy repos to /etc/yum.repos.d/, yum update and run <yum install ovirt-hosted-engine-setup>. 
5. Run on HE host restore command
<hosted-engine --deploy --restore-from-file=backup_compute-he-4>.

Actual results:
The Deploy starts ok with all the questions and then hangs for long time (I waited for two hours). then the host disconnects.
the last output lines are :
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Set FQDN]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Force the local VM FQDN to temporary resolve on the natted network address]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Restore sshd reverse DNS lookups]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Generate an answer file for engine-setup]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Include before engine-setup custom tasks files for the engine VM]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Copy the backup file to the engine VM for restore]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Run engine-backup]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Remove backup file]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [include_tasks]
[ INFO  ] ok: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Find configuration file for SCL PostgreSQL]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Check SCL PostgreSQL value]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Set SCL prefix for PostgreSQL]
[ INFO  ] ok: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Remove previous hosted-engine VM]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Remove dynamic data for VMs on the host used to redeploy]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Remove host used to redeploy]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Remove previous HE storage domain to avoid name conflicts]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Execute engine-setup]
[ INFO  ] changed: [compute-ge-he-4.scl.lab.tlv.redhat.com]
[ INFO  ] TASK [Include after engine-setup custom tasks files for the engine VM]
[ INFO  ] TASK [Wait for the engine to reach a stable condition]

No ssh access to the host (and no ping). Checking the Power Management shows that that the "Host is currently off".  The host could be started by power control. Though the restore operation didn't succeed - we have no engine.

Expected results: restore-deploy succeeds. engine and hosts are up


Additional info: the hosts in the environment are AMD (cougar03.scl.lab.tlv.redhat.com, cougar04.scl.lab.tlv.redhat.com, cougar05.scl.lab.tlv.redhat.com)

Comment 1 Polina 2018-10-29 11:48:27 UTC
verified on ovirt-hosted-engine-setup-2.2.30

Comment 2 Sandro Bonazzola 2018-11-02 14:36:04 UTC
This bugzilla is included in oVirt 4.2.7 release, published on November 2nd 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.

Comment 3 Sandro Bonazzola 2018-11-02 14:40:11 UTC
Closed by mistake, moving back to qa -> verified

Comment 4 Sandro Bonazzola 2019-01-22 10:23:21 UTC
This bugzilla is included in oVirt 4.2.8 release, published on January 22nd 2019.

Since the problem described in this bug report should be
resolved in oVirt 4.2.8 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.