Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1640155

Summary:

[DR] RHV failover of VMs to secondary site fails

Product:

[oVirt] ovirt-ansible-collection

Reporter:

SATHEESARAN <sasundar>

Component:

disaster-recovery

Assignee:

Tal Nisan <tnisan>

Status:

CLOSED CURRENTRELEASE

QA Contact:

Elad <ebenahar>

Severity:

urgent

Docs Contact:

Priority:

unspecified

Version:

1.1.10

CC:

bugs, ebenahar, gpulido, mkalinin, mperina, rhs-bugs, sabose, sankarshan, tnisan

Target Milestone:

ovirt-4.2.7-1

Keywords:

Regression, TestBlocker

Target Release:

1.1.3

Flags:

rule-engine: ovirt-4.2+
rule-engine: blocker+

Hardware:

x86_64

OS:

Linux

Whiteboard:

Fixed In Version:

ovirt-ansible-disaster-recovery-1.1.3

Doc Type:

If docs needed, set a value

Doc Text:

Story Points:

---

Clone Of:

1640139

Environment:

Last Closed:

2018-11-13 16:12:55 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

Storage

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

1640139

Attachments:

Description	Flags
ansible.log	none
mapping file used for failover	none

Description SATHEESARAN 2018-10-17 12:44:46 UTC

Description of problem:
------------------------
While running the failover playbook ( DR use case ) to failover VM from primary site to secondary, encountered error in the playbook.

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
RHHI 2.0

How reproducible:
-----------------
Always

Steps to Reproduce:
-------------------
1. Execute the failover playbook for the gluster backed storage domain

Actual results:
---------------
Playbook fails

Expected results:
------------------
Playbook should succeed and VM should be successfully failed over to secondary site


Additional info:

Comment 1 SATHEESARAN 2018-10-17 12:48:32 UTC

Here is the error reported

<snip>

2018-10-16 16:39:50,600 p=30924 u=root |  TASK [oVirt.disaster-recovery : Recover target engine] **************************************************
2018-10-16 16:39:50,600 p=30924 u=root |  task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/main.yml:19
2018-10-16 16:39:50,633 p=30924 u=root |  Read vars_file 'disaster_recovery_vars.yml'
2018-10-16 16:39:50,633 p=30924 u=root |  Read vars_file 'passwords.yml'
2018-10-16 16:39:50,648 p=30924 u=root |  fatal: [localhost]: FAILED! => {
    "reason": "Invalid options for include_tasks: storage\n\nThe error appears to have been in '/usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover_engine.yml': line 42, column 7, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n    # domain (which will make another storage domain as master instead).\n    - name: Add master storage domain to the setup\n      ^ here\n"
}
2018-10-16 16:39:50,650 p=30924 u=root |  	to retry, use: --limit @/usr/share/ansible/roles/oVirt.disaster-recovery/files/failover.retry

2018-10-16 16:39:50,650 p=30924 u=root |  PLAY RECAP **********************************************************************************************

</snip>

Comment 2 SATHEESARAN 2018-10-17 12:59:28 UTC

ansible-2.7.0-0.4.rc4.el7ae.noarch
ovirt-ansible-disaster-recovery-1.1.2-1.el7ev.noarch

Comment 3 SATHEESARAN 2018-10-17 12:59:49 UTC

(In reply to SATHEESARAN from comment #2)
> ansible-2.7.0-0.4.rc4.el7ae.noarch
> ovirt-ansible-disaster-recovery-1.1.2-1.el7ev.noarch

Tested with the above said components

Comment 4 SATHEESARAN 2018-10-17 13:00:17 UTC

Created attachment 1494841 [details]
ansible.log

Comment 5 SATHEESARAN 2018-10-17 13:00:59 UTC

Created attachment 1494842 [details]
mapping file used for failover

Comment 6 Sandro Bonazzola 2018-10-29 07:29:49 UTC

Which milestone is this bug targeted to?

Comment 7 Maor 2018-10-29 08:35:42 UTC

I assume that the closest one should be 4.2.7 so target it to this milestone

Comment 8 Sahina Bose 2018-10-29 13:30:42 UTC

Maor, is there a workaround to this issue (till 4.2.8 is released)?

Comment 9 Maor 2018-10-31 12:21:41 UTC

You can use ansible 2.6.x until the fix will be published.
Elad, I know that you had doubts about this bug will be tested for 4.2.7, is there any chance we can still push it?

Comment 10 Elad 2018-10-31 12:31:47 UTC

Yes, we will probably get a respin for 4.2.7 for the fix here.

Comment 11 Sahina Bose 2018-10-31 12:38:19 UTC

(In reply to Maor from comment #9)
> You can use ansible 2.6.x until the fix will be published.

Downgrading to ansible 2.6.x is not an option as all other features are tested with 2.7 (and we would anyways always get the latest ansible in channel)

> Elad, I know that you had doubts about this bug will be tested for 4.2.7, is
> there any chance we can still push it?

Comment 13 Elad 2018-11-04 14:29:17 UTC

Failover with Gluster domain succeeded, domain was imported successfully to the secondary site:




TASK [oVirt.disaster-recovery : Recover target engine] ************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/main.yml:19
included: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover_engine.yml for localhost

TASK [oVirt.disaster-recovery : Obtain SSO token] *****************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover_engine.yml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081 `" && echo ansible-tmp-1541341086.35-217143828950081="` echo /root/.ansible/tmp/ansible-tm
p-1541341086.35-217143828950081 `" ) && sleep 0'
Using module file /usr/lib/python2.7/site-packages/ansible/modules/cloud/ovirt/ovirt_auth.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-21392Ohs45r/tmpFWuT_9 TO /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/AnsiballZ_ovirt_auth.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/ /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/AnsiballZ_ovirt_auth.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/bin/python2 /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/AnsiballZ_ovirt_auth.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1541341086.35-217143828950081/ > /dev/null 2>&1 && sleep 0'
ok: [localhost] => {

.
.
.



TASK [oVirt.disaster-recovery : Add storage domain if Gluster] ****************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_domain.yml:19
included: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml for localhost

TASK [oVirt.disaster-recovery : Add Gluster storage domain] *******************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'


==========================================================

Used:
ovirt-ansible-disaster-recovery-1.1.3-1.el7ev.noarch
ovirt-ansible-roles-1.1.5-2.el7ev.noarch
ansible-2.7.1-1.el7ae.noarch
ovirt-engine-4.2.7.4-0.1.el7ev.noarch



Thanks kgoldbla for your help!

Comment 17 Sandro Bonazzola 2018-11-13 16:12:55 UTC

This bugzilla is included in oVirt 4.2.7 Async 1 release, published on November 13th 2018.

Since the problem described in this bug report should be resolved in oVirt 4.2.7 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.