Created attachment 1713141 [details] disaster_recovery_vars.yml Description of problem: When testing the Disaster Recovery setup described in the RHHI-V 1.8 documentation [1], the fail over procedure to the secondary environment fails with this error: ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400. [1] https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.8/html/maintaining_red_hat_hyperconverged_infrastructure_for_virtualization/config-backup-recovery Version-Release number of selected component (if applicable): ovirt-ansible-disaster-recovery-1.3.0-0.1.master.20200219155422.el8ev.noarch ansible-2.9.11-1.el8ae.noarch ovirt-engine-4.4.1.10-0.1.el8ev.noarch How reproducible: Reported by customer and reproducible in a lab Steps to Reproduce: 1. Setup 2 independent RHHI-V 1.8 3-node clusters. source cluster: jorti-rhhi18-01.nested.lab jorti-rhhi18-02.nested.lab jorti-rhhi18-03.nested.lab target cluster: jorti-rhhi18-04.nested.lab jorti-rhhi18-05.nested.lab jorti-rhhi18-06.nested.lab 2. Configure password-less SSH authentication from primary node in source cluster to all other hosts. 3. Create empty Gluster volume 'dest' in secondary site. 4. Configure geo-replication of source volume 'data' to target volume 'dest' in secondary site # gluster volume set all cluster.enable-shared-storage enable # gluster volume set data features.shard enable # gluster system:: execute gsec_create # gluster volume geo-replication data jorti-rhhi18-04.nested.lab::dest create push-pem # gluster volume geo-replication data jorti-rhhi18-04.nested.lab::dest config use_meta_volume true 5. Synchronize volume in source cluster from the RHV GUI: Storage -> Volumes -> data -> Geo-replication -> sync 6. Create a geo-replication schedule Storage -> Domains -> data -> Remote data sync setup -> Create daily schedule 7. Create the playbooks as explained in the documentation: https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.8/html/maintaining_red_hat_hyperconverged_infrastructure_for_virtualization/config-backup-recovery#config-backup-recovery-mapping-file 8. Set the target volume in read-write: # gluster volume set dest features.read-only off 9. Run the fail over playbook, # ansible-playbook dr-rhv-failover.yml --tags="fail_over" -vvvv Actual results: The playbook fails in the task "Add Gluster storage domain" with the error below. ~~~ TASK [oVirt.disaster-recovery : Add Gluster storage domain] *********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************** task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml:2 <127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root <127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0' <127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306 && echo ansible-tmp-1598873453.7682593-166227-46267692096306="` echo /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306 `" ) && sleep 0' Using module file /usr/lib/python3.6/site-packages/ansible/modules/cloud/ovirt/ovirt_storage_domain.py <127.0.0.1> PUT /root/.ansible/tmp/ansible-local-166078pyuv0sh6/tmpvj9nunyl TO /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py <127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/ /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py && sleep 0' <127.0.0.1> EXEC /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py && sleep 0' <127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/ > /dev/null 2>&1 && sleep 0' The full traceback is: Traceback (most recent call last): File "/tmp/ansible_ovirt_storage_domain_payload_i_ppik_4/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py", line 770, in main File "/tmp/ansible_ovirt_storage_domain_payload_i_ppik_4/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py", line 623, in create **kwargs File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 26097, in add return self._internal_add(storage_domain, headers, query, wait) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add return future.wait() if wait else future File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait return self._code(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback self._check_fault(response) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault self._raise_error(response, body) File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error raise error ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400. fatal: [localhost]: FAILED! => { "changed": false, "invocation": { "module_args": { "backup": false, "comment": null, "critical_space_action_blocker": 5, "data_center": "Default", "description": null, "destroy": null, "discard_after_delete": null, "domain_function": "data", "fcp": null, "fetch_nested": false, "format": null, "glusterfs": { "address": "jorti-rhhi18-04.nested.lab", "path": "/dest" }, "host": "jorti-rhhi18-04.nested.lab", "id": null, "iscsi": null, "localfs": null, "managed_block_storage": null, "name": "dest", "nested_attributes": [], "nfs": null, "poll_interval": 3, "posixfs": null, "state": "present", "timeout": 180, "wait": true, "warning_low_space": 10, "wipe_after_delete": false } }, "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]\". HTTP response code is 400." } ...ignoring Read vars_file 'disaster_recovery_vars.yml' Read vars_file 'passwords.yml' ~~~ Expected results: The geo-replicated storage domain 'dest' is imported in the secondary cluster Additional info: I attach the disaster_recovery_vars.yml used. This is the task that is failing: ~~~ /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml - name: Add Gluster storage domain ovirt_storage_domain: name: "{{ gluster_storage['dr_' + dr_target_host + '_name'] }}" critical_space_action_blocker: "{{ gluster_storage['dr_critical_space_action_blocker'] }}" domain_function: "{{ gluster_storage['dr_storage_domain_type'] }}" warning_low_space: "{{ gluster_storage['dr_warning_low_space'] }}" wipe_after_delete: "{{ gluster_storage['dr_wipe_after_delete'] }}" backup: "{{ gluster_storage['dr_backup'] }}" host: "{{ ovirt_hosts[0].name }}" data_center: "{{ gluster_storage['dr_' + dr_target_host + '_dc_name'] }}" auth: "{{ ovirt_auth }}" glusterfs: path: "{{ gluster_storage['dr_' + dr_target_host + '_path'] }}" address: "{{ gluster_storage['dr_' + dr_target_host + '_address'] }}" register: result ~~~
Ritesh, can you take a look?
~~~ /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml - name: Add Gluster storage domain ovirt_storage_domain: name: "{{ gluster_storage['dr_' + dr_target_host + '_name'] }}" critical_space_action_blocker: "{{ gluster_storage['dr_critical_space_action_blocker'] }}" domain_function: "{{ gluster_storage['dr_storage_domain_type'] }}" warning_low_space: "{{ gluster_storage['dr_warning_low_space'] }}" wipe_after_delete: "{{ gluster_storage['dr_wipe_after_delete'] }}" backup: "{{ gluster_storage['dr_backup'] }}" host: "{{ ovirt_hosts[0].name }}" data_center: "{{ gluster_storage['dr_' + dr_target_host + '_dc_name'] }}" auth: "{{ ovirt_auth }}" glusterfs: path: "{{ gluster_storage['dr_' + dr_target_host + '_path'] }}" address: "{{ gluster_storage['dr_' + dr_target_host + '_address'] }}" register: result ~~~ Do we need a state:imported here when we try to add gluster volume with data?
Moved to wrong status, will move to ON_QA with availability of respin of RHVH for async, with 3.5.3 included
Tested with 4.4.3.12-0.1.el8ev and glusterfs-6.0-49.el8rhgs, with glusterfs-selinux package. Geo-replication successfully syncs the data from the primary gluster volume to secondary gluster volume using rsync as the sync-method. Also post the sync disaster-recovery roles works good and the VMs could successfully start on the secondary site