Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1590424

Summary: [Blocked] [DR] With the script ovirt_dr failback IOCTL fails with exception
Product: [oVirt] ovirt-ansible-collection Reporter: Kevin Alon Goldblatt <kgoldbla>
Component: disaster-recoveryAssignee: Maor <mlipchuk>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 1.1.0CC: kgoldbla, mlipchuk, tnisan, ylavi
Target Milestone: ovirt-4.2.6Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: DR
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-09-03 15:08:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
ovirt_dr.log file none

Description Kevin Alon Goldblatt 2018-06-12 15:03:05 UTC
Created attachment 1450529 [details]
ovirt_dr.log file

Description of problem:
Via the script ./ovirt-dr failback IOCTL fails with the exception:

-------------------------------------------------------------------------
TASK [oVirt.disaster-recovery : Failback Replication Sync pause] *********************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/main.yml:2
[oVirt.disaster-recovery : Failback Replication Sync pause]
[Failback Replication Sync] Please press ENTER once the destination storage domains are ready to be used for the destination setup:
The full traceback is:
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 138, in run
    res = self._execute()
  File "/usr/lib/python2.7/site-packages/ansible/executor/task_executor.py", line 561, in _execute
    result = self._handler.run(task_vars=variables)
  File "/usr/lib/python2.7/site-packages/ansible/plugins/action/pause.py", line 202, in run
    tty.setraw(stdout.fileno())
  File "/usr/lib64/python2.7/tty.py", line 20, in setraw
    mode = tcgetattr(fd)
error: (25, 'Inappropriate ioctl for device')
fatal: [localhost]: FAILED! => {
    "msg": "Unexpected failure during module execution.", 
    "stdout": ""
}

NO MORE HOSTS LEFT *******************************************************************************************************************************
        to retry, use: --limit @/usr/share/doc/ovirt-ansible-disaster-recovery-1.1.0/examples/dr_play.retry

PLAY RECAP ***************************************************************************************************************************************
localhost                  : ok=1    changed=0    unreachable=0    failed=1   

 [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
-------------------------------------------------------------------------------


With ansible it works:

ansible-playbook /usr/share/doc/ovirt-ansible-disaster-recovery-1.1.0/examples/dr_play.yml -t fail_back -e @/var/lib/ovirt-ansible-disaster-recovery/mapping_vars.yml -e @ovirt_password2.yml -e 'dr_target_host=primary dr_source_map=secondary dr_report_file=report-15288125836temp.log' --vault-password-file vault_secret.sh -vvv





Version-Release number of selected component (if applicable):
ovirt-ansible-disaster-recovery-1.1.0-1.el7ev.noarch
ansible-2.5.4-1.el7ae.noarch
Python 2.7.14
ovirt-engine-4.2.4-0.1.el7.noarch

How reproducible:
100%

Steps to Reproduce:
1. Run ./ovirt-dr failback >>>>> ends with the exception above. Using ansible works
2.
3.

Actual results:
The script fails with the exception above

Expected results:
should work

Additional info:

Comment 1 Maor 2018-07-02 19:18:49 UTC
Seems like a regression which was introduced in this commit https://github.com/ansible/ansible/commit/1c20029694b647c90612b5116bb619d806bf2aae
This is blocked on issue https://github.com/ansible/ansible/issues/41717
Once that will be fixed we can verify this bug as well

Comment 2 Maor 2018-07-17 08:16:15 UTC
(In reply to Maor from comment #1)
> Seems like a regression which was introduced in this commit
> https://github.com/ansible/ansible/commit/
> 1c20029694b647c90612b5116bb619d806bf2aae
> This is blocked on issue https://github.com/ansible/ansible/issues/41717
> Once that will be fixed we can verify this bug as well

This fix for the regression will be published in the next releases of Ansible: 2.5.7 and 2.6.2 (see https://github.com/ansible/ansible/issues/41717#issuecomment-405291773)
Once it will be released this bug should be fixed.

As a temporary fix, the user can call the follwing procedures:
  1) Call clean_setup - # ansible-playbook dr-cleanup.yml --tags "clean_engine"
  2) Sync the target storage servers using storage replication
  3) Call fail_over

Keep in mind that running VMs will not be preserved in that way.

Both fail_over and clean_setup can be found in the  documentation in section 3.3 and 3.4 :
https://access.redhat.com/documentation/en-us/red_hat_virtualization/4.2/pdf/disaster_recovery_guide/Red_Hat_Virtualization-4.2-Disaster_Recovery_Guide-en-US.pdf

Comment 3 Yaniv Lavi 2018-08-19 10:57:21 UTC
Please open an urgent bug on downstream Ansible.
Please also discuss either automation or picking the Ansible versions to not break like this on a critical function like DR.

Comment 4 Maor 2018-08-19 13:15:38 UTC
It seems like ansible 2.6.2-1.el7 was already released.
Kevin, can you please confirm? if so we can simply verify and close this bug.

Comment 5 Maor 2018-08-19 14:09:40 UTC
Testing form my env seems to work, moving to ON_QA to verify

Comment 6 Kevin Alon Goldblatt 2018-08-23 09:55:12 UTC
(In reply to Maor from comment #4)
> It seems like ansible 2.6.2-1.el7 was already released.
> Kevin, can you please confirm? if so we can simply verify and close this bug.

Retested and works now,

Comment 7 Kevin Alon Goldblatt 2018-08-23 10:00:31 UTC
Verified with the following code:
---------------------------------------------
ovirt-engine-4.2.5.3-0.1.el7ev.noarch
vdsm-4.20.35-1.el7ev.x86_64
ansible-2.6.3-1.el7ae.noarch


Verified with the following scenario:
---------------------------------------------
Ran the /ovirt-dr failback
Pressed enter at the prompt - " Please press ENTER once the destination storage domains are ready to be used for the destination setup:"
The failback script continues successfully


Moving to VERIFIED