Bug 1575984

Summary: fail_over script reports false positively DR fail over operations
Product: [oVirt] ovirt-ansible-collection Reporter: Elad <ebenahar>
Component: disaster-recoveryAssignee: Maor <mlipchuk>
Status: CLOSED CURRENTRELEASE QA Contact: Kevin Alon Goldblatt <kgoldbla>
Severity: high Docs Contact:
Priority: unspecified    
Version: 1.1.4CC: mlipchuk, tnisan
Target Milestone: ovirt-4.2.4Flags: rule-engine: ovirt-4.2+
Target Release: ---   
Hardware: x86_64   
OS: Unspecified   
Whiteboard: DR
Fixed In Version: ovirt-ansible-disaster-recovery-1.1.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-26 08:40:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1582073    
Attachments:
Description Flags
ovirt-dr.log none

Description Elad 2018-05-08 13:31:26 UTC
Created attachment 1433218 [details]
ovirt-dr.log

Description of problem:
In case of a failure in DR fail over, fail_over.py reports it as a success.

Version-Release number of selected component (if applicable):
ovirt-ansible-disaster-recovery-0.4-1.el7ev.noarch
ansible-2.5.2-1.el7ae.noarch

How reproducible:
Always

Steps to Reproduce:
1. Execute DR fail_over.py while the engine (source or target) is unreachable


Actual results:
Fail over fails, /var/log/ovirt-dr/ovirt-dr.log:

Traceback (most recent call last):
  File "/tmp/ansible_TVKfm8/ansible_module_ovirt_auth.py", line 272, in main
    token = connection.authenticate()
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 384, in authenticate
    self.__parse_error(e)
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 932, in __parse_error
    six.reraise(clazz, clazz(error_msg), sys.exc_info()[2])
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 381, in authenticate
    self._sso_token = self._get_access_token()
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 617, in _get_access_token
    sso_response = self._get_sso_response(self._sso_url, post_data)
  File "/usr/lib64/python2.7/site-packages/ovirtsdk4/__init__.py", line 694, in _get_sso_response
    curl.perform()
ConnectionError: Error while sending HTTP request: (7, 'Failed to connect to rhv-dr2.scl.lab.tlv.redhat.com port 443: Connection timed out')
fatal: [localhost]: FAILED! => {
    "changed": false, 
    "invocation": {
        "module_args": {
            "ca_file": "/home/ebenahar/rhv-dr2-ca", 
            "compress": true, 
            "headers": null, 
            "insecure": null, 
            "kerberos": false, 
            "ovirt_auth": null, 
            "password": "VALUE_SPECIFIED_IN_NO_LOG_PARAMETER", 
            "state": "present", 
            "timeout": 0, 
            "token": null, 
            "url": "https://rhv-dr2.scl.lab.tlv.redhat.com/ovirt-engine/api", 
            "username": "admin@internal"
        }
    }, 
    "msg": "Error while sending HTTP request: (7, 'Failed to connect to rhv-dr2.scl.lab.tlv.redhat.com port 443: Connection timed out')"
}





fail_over.py output:
====================================

[Failover] Start failover operation...

[Failover] target_host: secondary 
[Failover] source_map: primary 
[Failover] var_file: /var/lib/ovirt-ansible-disaster-recovery/mapping_vars.yml 
[Failover] vault: /usr/share/ansible/roles/oVirt.disaster-recovery/ovirt_passwords.yml 
[Failover] ansible_play: ../examples/dr_play.yml 

Vault password: 
cat: ../files/report.log: No such file or directory

[Failover] Finished failover operation for oVirt ansible disaster recovery

====================================

Expected results:
fail_over.py output should contain an error message for the fail over failure

Additional info:
ovirt-dr.log

Comment 2 Maor 2018-05-17 05:05:51 UTC
(In reply to Maor from comment #1)
> Fixed in commit
> https://github.com/oVirt/ovirt-ansible-disaster-recovery/pull/43/commits/
> 712bb4f669eb569b742ac476d450184eac1fac3c

Here is an example of the output with that fix while the engine in unreachable:

[Failover] Please enter the vault password: 1

TASK [Gathering Facts] *******************************************************************************************************



TASK [oVirt.disaster-recovery : Recover target engine] ***********************************************************************



TASK [oVirt.disaster-recovery : Obtain SSO token] ****************************************************************************


 [WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'

Exception: Command '['ansible-playbook', '/usr/share/doc/ovirt-ansible-disaster-recovery/examples/dr_play.yml', '-t', 'fail_over', '-e', '@/var/lib/ovirt-ansible-disaster-recovery/mapping_vars.yml', '-e', '@/usr/share/doc/ovirt-ansible-disaster-recovery/examples/ovirt_passwords.yml', '-e', ' dr_target_host=secondary dr_source_map=primary dr_report_file=report-1526533446084.log', '--vault-password-file', 'vault_secret.sh', '-vvv']' returned non-zero exit status 2

failover operation failed, please check log file for further details.

Comment 3 Kevin Alon Goldblatt 2018-06-07 15:42:14 UTC
Verified with the following code;
-----------------------------------------
ovirt-ansible-disaster-recovery-1.1.0-1.el7ev.noarch

Verified with the following scenario:
-----------------------------------------
Ran sudo ./ovirt-dr failover

Due to an error in the password file the dr failed wit an error>>>>
Exception: Command '['ansible-playbook', '/usr/share/doc/ovirt-ansible-disaster-recovery-1.1.0/examples/dr_play.yml', '-t', 'fail_over', '-e', '@/var/lib/ovirt-ansible-disaster-recovery/mapping_vars.yml', '-e', '@/usr/share/doc/ovirt-ansible-disaster-recovery-1.1.0/examples/ovirt_passwords.yml', '-e', ' dr_target_host=secondary dr_source_map=primary dr_report_file=report-1528385666059.log', '--vault-password-file', 'vault_secret.sh', '-vvv']' returned non-zero exit status 2

failover operation failed, please check log file for further details.


Moving to VERIFIED!

Comment 4 Sandro Bonazzola 2018-06-26 08:40:42 UTC
This bugzilla is included in oVirt 4.2.4 release, published on June 26th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.4 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.