Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1640155

Summary: [DR] RHV failover of VMs to secondary site fails
Product: [oVirt] ovirt-ansible-collection Reporter: SATHEESARAN <sasundar>
Component: disaster-recoveryAssignee: Tal Nisan <tnisan>
Status: CLOSED CURRENTRELEASE QA Contact: Elad <ebenahar>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 1.1.10CC: bugs, ebenahar, gpulido, mkalinin, mperina, rhs-bugs, sabose, sankarshan, tnisan
Target Milestone: ovirt-4.2.7-1Keywords: Regression, TestBlocker
Target Release: 1.1.3Flags: rule-engine: ovirt-4.2+
rule-engine: blocker+
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-ansible-disaster-recovery-1.1.3 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1640139 Environment:
hc
Last Closed: 2018-11-13 16:12:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1640139    
Attachments:
Description Flags
ansible.log
none
mapping file used for failover none

Description SATHEESARAN 2018-10-17 12:44:46 UTC
Description of problem:
------------------------
While running the failover playbook ( DR use case ) to failover VM from primary site to secondary, encountered error in the playbook.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHHI 2.0

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Execute the failover playbook for the gluster backed storage domain

Actual results:
---------------
Playbook fails

Expected results:
------------------
Playbook should succeed and VM should be successfully failed over to secondary site


Additional info:

Comment 1 SATHEESARAN 2018-10-17 12:48:32 UTC
Here is the error reported

<snip>

2018-10-16 16:39:50,600 p=30924 u=root |  TASK [oVirt.disaster-recovery : Recover target engine] **************************************************
2018-10-16 16:39:50,600 p=30924 u=root |  task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/main.yml:19
2018-10-16 16:39:50,633 p=30924 u=root |  Read vars_file 'disaster_recovery_vars.yml'
2018-10-16 16:39:50,633 p=30924 u=root |  Read vars_file 'passwords.yml'
2018-10-16 16:39:50,648 p=30924 u=root |  fatal: [localhost]: FAILED! => {
    "reason": "Invalid options for include_tasks: storage\n\nThe error appears to have been in '/usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover_engine.yml': line 42, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n    # domain (which will make another storage domain as master instead).\n    - name: Add master storage domain to the setup\n      ^ here\n"
}
2018-10-16 16:39:50,650 p=30924 u=root |  	to retry, use: --limit @/usr/share/ansible/roles/oVirt.disaster-recovery/files/failover.retry

2018-10-16 16:39:50,650 p=30924 u=root |  PLAY RECAP **********************************************************************************************

</snip>

Comment 2 SATHEESARAN 2018-10-17 12:59:28 UTC
ansible-2.7.0-0.4.rc4.el7ae.noarch
ovirt-ansible-disaster-recovery-1.1.2-1.el7ev.noarch

Comment 3 SATHEESARAN 2018-10-17 12:59:49 UTC
(In reply to SATHEESARAN from comment #2)
> ansible-2.7.0-0.4.rc4.el7ae.noarch
> ovirt-ansible-disaster-recovery-1.1.2-1.el7ev.noarch

Tested with the above said components

Comment 4 SATHEESARAN 2018-10-17 13:00:17 UTC
Created attachment 1494841 [details]
ansible.log

Comment 5 SATHEESARAN 2018-10-17 13:00:59 UTC
Created attachment 1494842 [details]
mapping file used for failover

Comment 6 Sandro Bonazzola 2018-10-29 07:29:49 UTC
Which milestone is this bug targeted to?

Comment 7 Maor 2018-10-29 08:35:42 UTC
I assume that the closest one should be 4.2.7 so target it to this milestone

Comment 8 Sahina Bose 2018-10-29 13:30:42 UTC
Maor, is there a workaround to this issue (till 4.2.8 is released)?

Comment 9 Maor 2018-10-31 12:21:41 UTC
You can use ansible 2.6.x until the fix will be published.
Elad, I know that you had doubts about this bug will be tested for 4.2.7, is there any chance we can still push it?

Comment 10 Elad 2018-10-31 12:31:47 UTC
Yes, we will probably get a respin for 4.2.7 for the fix here.

Comment 11 Sahina Bose 2018-10-31 12:38:19 UTC
(In reply to Maor from comment #9)
> You can use ansible 2.6.x until the fix will be published.

Downgrading to ansible 2.6.x is not an option as all other features are tested with 2.7 (and we would anyways always get the latest ansible in channel)

> Elad, I know that you had doubts about this bug will be tested for 4.2.7, is
> there any chance we can still push it?

Comment 13 Elad 2018-11-04 14:29:17 UTC
Failover with Gluster domain succeeded, domain was imported successfully to the secondary site:




TASK [oVirt.disaster-recovery : Recover target engine] ************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/main.yml:19
included: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover_engine.yml for localhost

TASK [oVirt.disaster-recovery : Obtain SSO token] *****************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover_engine.yml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081 `" && echo ansible-tmp-1541341086.35-217143828950081="` echo /root/.ansible/tmp/ansible-tm
p-1541341086.35-217143828950081 `" ) && sleep 0'
Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/ovirt/ovirt_auth.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-21392Ohs45r/tmpFWuT_9 TO /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/AnsiballZ_ovirt_auth.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/ /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/AnsiballZ_ovirt_auth.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/AnsiballZ_ovirt_auth.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/ > /dev/null 2>&1 && sleep 0'
ok: [localhost] => {

.
.
.



TASK [oVirt.disaster-recovery : Add storage domain if Gluster] ****************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_domain.yml:19
included: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml for localhost

TASK [oVirt.disaster-recovery : Add Gluster storage domain] *******************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'


==========================================================

Used:
ovirt-ansible-disaster-recovery-1.1.3-1.el7ev.noarch
ovirt-ansible-roles-1.1.5-2.el7ev.noarch
ansible-2.7.1-1.el7ae.noarch
ovirt-engine-4.2.7.4-0.1.el7ev.noarch



Thanks kgoldbla for your help!

Comment 17 Sandro Bonazzola 2018-11-13 16:12:55 UTC
This bugzilla is included in oVirt 4.2.7 Async 1 release, published on November 13th 2018.

Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.