Bug 1874049 - RHHI-V 1.8 DR fails to import geo-replicated Gluster storage domain
Summary: RHHI-V 1.8 DR fails to import geo-replicated Gluster storage domain
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhhi
Version: rhhiv-1.8
Hardware: x86_64
OS: Linux
urgent
urgent
Target Milestone: ---
: RHHI-V 1.8.z Async Update
Assignee: Prajith
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1880256 1889673
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-08-31 12:05 UTC by Juan Orti
Modified: 2024-03-25 16:23 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-01-11 07:09:18 UTC
Embargoed:


Attachments (Terms of Use)
disaster_recovery_vars.yml (2.54 KB, text/plain)
2020-08-31 12:05 UTC, Juan Orti
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5359651 0 None None None 2020-08-31 12:14:07 UTC

Description Juan Orti 2020-08-31 12:05:20 UTC
Created attachment 1713141 [details]
disaster_recovery_vars.yml

Description of problem:
When testing the Disaster Recovery setup described in the RHHI-V 1.8 documentation [1], the fail over procedure to the secondary environment fails with this error:

ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.


[1] https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.8/html/maintaining_red_hat_hyperconverged_infrastructure_for_virtualization/config-backup-recovery

Version-Release number of selected component (if applicable):
ovirt-ansible-disaster-recovery-1.3.0-0.1.master.20200219155422.el8ev.noarch
ansible-2.9.11-1.el8ae.noarch
ovirt-engine-4.4.1.10-0.1.el8ev.noarch

How reproducible:
Reported by customer and reproducible in a lab


Steps to Reproduce:
1. Setup 2 independent RHHI-V 1.8  3-node clusters.

source cluster:
jorti-rhhi18-01.nested.lab
jorti-rhhi18-02.nested.lab
jorti-rhhi18-03.nested.lab

target cluster:
jorti-rhhi18-04.nested.lab
jorti-rhhi18-05.nested.lab
jorti-rhhi18-06.nested.lab

2. Configure password-less SSH authentication from primary node in source cluster to all other hosts.
3. Create empty Gluster volume 'dest' in secondary site.
4. Configure geo-replication of source volume 'data' to target volume 'dest' in secondary site

# gluster volume set all cluster.enable-shared-storage enable
# gluster volume set data features.shard enable
# gluster system:: execute gsec_create
# gluster volume geo-replication data jorti-rhhi18-04.nested.lab::dest create push-pem
# gluster volume geo-replication data jorti-rhhi18-04.nested.lab::dest config use_meta_volume true

5. Synchronize volume in source cluster from the RHV GUI:
Storage -> Volumes -> data -> Geo-replication -> sync

6. Create a geo-replication schedule
Storage -> Domains -> data -> Remote data sync setup -> Create daily schedule

7. Create the playbooks as explained in the documentation:
https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.8/html/maintaining_red_hat_hyperconverged_infrastructure_for_virtualization/config-backup-recovery#config-backup-recovery-mapping-file

8. Set the target volume in read-write:

# gluster volume set dest features.read-only off

9. Run the fail over playbook,
# ansible-playbook dr-rhv-failover.yml --tags="fail_over" -vvvv

Actual results:
The playbook fails in the task "Add Gluster storage domain" with the error below.

~~~
TASK [oVirt.disaster-recovery : Add Gluster storage domain] ***********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306 && echo ansible-tmp-1598873453.7682593-166227-46267692096306="` echo /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306 `" ) && sleep 0'
Using module file /usr/lib/python3.6/site-packages/ansible/modules/cloud/ovirt/ovirt_storage_domain.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-166078pyuv0sh6/tmpvj9nunyl TO /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/ /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_ovirt_storage_domain_payload_i_ppik_4/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py", line 770, in main
  File "/tmp/ansible_ovirt_storage_domain_payload_i_ppik_4/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py", line 623, in create
    **kwargs
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 26097, in add
    return self._internal_add(storage_domain, headers, query, wait)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add
    return future.wait() if wait else future
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait
    return self._code(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback
    self._check_fault(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault
    self._raise_error(response, body)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error
    raise error
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.
fatal: [localhost]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "backup": false,
            "comment": null,
            "critical_space_action_blocker": 5,
            "data_center": "Default",
            "description": null,
            "destroy": null,
            "discard_after_delete": null,
            "domain_function": "data",
            "fcp": null,
            "fetch_nested": false,
            "format": null,
            "glusterfs": {
                "address": "jorti-rhhi18-04.nested.lab",
                "path": "/dest"
            },
            "host": "jorti-rhhi18-04.nested.lab",
            "id": null,
            "iscsi": null,
            "localfs": null,
            "managed_block_storage": null,
            "name": "dest",
            "nested_attributes": [],
            "nfs": null,
            "poll_interval": 3,
            "posixfs": null,
            "state": "present",
            "timeout": 180,
            "wait": true,
            "warning_low_space": 10,
            "wipe_after_delete": false
        }
    },
    "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]\". HTTP response code is 400."
}
...ignoring
Read vars_file 'disaster_recovery_vars.yml'
Read vars_file 'passwords.yml'
~~~


Expected results:
The geo-replicated storage domain 'dest' is imported in the secondary cluster

Additional info:
I attach the disaster_recovery_vars.yml used.

This is the task that is failing:

~~~
/usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml


    - name: Add Gluster storage domain
      ovirt_storage_domain:
          name: "{{ gluster_storage['dr_' + dr_target_host + '_name'] }}"
          critical_space_action_blocker: "{{ gluster_storage['dr_critical_space_action_blocker'] }}"
          domain_function: "{{ gluster_storage['dr_storage_domain_type'] }}"
          warning_low_space: "{{ gluster_storage['dr_warning_low_space'] }}"
          wipe_after_delete: "{{ gluster_storage['dr_wipe_after_delete'] }}"
          backup: "{{ gluster_storage['dr_backup'] }}"
          host: "{{ ovirt_hosts[0].name }}"
          data_center: "{{ gluster_storage['dr_' + dr_target_host + '_dc_name'] }}"
          auth: "{{ ovirt_auth }}"
          glusterfs:
              path: "{{ gluster_storage['dr_' + dr_target_host + '_path'] }}"
              address: "{{ gluster_storage['dr_' + dr_target_host + '_address'] }}"
      register: result
~~~

Comment 5 Sahina Bose 2020-10-08 13:31:34 UTC
Ritesh, can you take a look?

Comment 6 Sahina Bose 2020-10-09 05:38:24 UTC
~~~
/usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml


    - name: Add Gluster storage domain
      ovirt_storage_domain:
          name: "{{ gluster_storage['dr_' + dr_target_host + '_name'] }}"
          critical_space_action_blocker: "{{ gluster_storage['dr_critical_space_action_blocker'] }}"
          domain_function: "{{ gluster_storage['dr_storage_domain_type'] }}"
          warning_low_space: "{{ gluster_storage['dr_warning_low_space'] }}"
          wipe_after_delete: "{{ gluster_storage['dr_wipe_after_delete'] }}"
          backup: "{{ gluster_storage['dr_backup'] }}"
          host: "{{ ovirt_hosts[0].name }}"
          data_center: "{{ gluster_storage['dr_' + dr_target_host + '_dc_name'] }}"
          auth: "{{ ovirt_auth }}"
          glusterfs:
              path: "{{ gluster_storage['dr_' + dr_target_host + '_path'] }}"
              address: "{{ gluster_storage['dr_' + dr_target_host + '_address'] }}"
      register: result
~~~

Do we need a state:imported here when we try to add gluster volume with data?

Comment 16 Gobinda Das 2020-11-18 05:16:47 UTC
Moved to wrong status, will move to ON_QA with availability of respin of RHVH for async, with 3.5.3 included

Comment 17 SATHEESARAN 2020-12-04 10:56:55 UTC
Tested with 4.4.3.12-0.1.el8ev and glusterfs-6.0-49.el8rhgs, with glusterfs-selinux package.

Geo-replication successfully syncs the data from the primary gluster volume to secondary gluster volume
using rsync as the sync-method.

Also post the sync disaster-recovery roles works good and the VMs could successfully start on the 
secondary site


Note You need to log in before you can comment on or make changes to this bug.