Bug 1874049

Summary: RHHI-V 1.8 DR fails to import geo-replicated Gluster storage domain
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Juan Orti <jortialc>
Component: rhhiAssignee: Prajith <pkesavap>
Status: CLOSED CURRENTRELEASE QA Contact: SATHEESARAN <sasundar>
Severity: urgent Docs Contact:
Priority: urgent    
Version: rhhiv-1.8CC: eshenitz, godas, mkalinin, mnecas, pbar, pprakash, rchikatw, rhs-bugs, sabose, sasundar, sheggodu, tnisan
Target Milestone: ---Keywords: ZStream
Target Release: RHHI-V 1.8.z Async Update   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-11 07:09:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Storage RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1880256, 1889673    
Bug Blocks:    
Attachments:
Description Flags
disaster_recovery_vars.yml none

Description Juan Orti 2020-08-31 12:05:20 UTC
Created attachment 1713141 [details]
disaster_recovery_vars.yml

Description of problem:
When testing the Disaster Recovery setup described in the RHHI-V 1.8 documentation [1], the fail over procedure to the secondary environment fails with this error:

ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.


[1] https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.8/html/maintaining_red_hat_hyperconverged_infrastructure_for_virtualization/config-backup-recovery

Version-Release number of selected component (if applicable):
ovirt-ansible-disaster-recovery-1.3.0-0.1.master.20200219155422.el8ev.noarch
ansible-2.9.11-1.el8ae.noarch
ovirt-engine-4.4.1.10-0.1.el8ev.noarch

How reproducible:
Reported by customer and reproducible in a lab


Steps to Reproduce:
1. Setup 2 independent RHHI-V 1.8  3-node clusters.

source cluster:
jorti-rhhi18-01.nested.lab
jorti-rhhi18-02.nested.lab
jorti-rhhi18-03.nested.lab

target cluster:
jorti-rhhi18-04.nested.lab
jorti-rhhi18-05.nested.lab
jorti-rhhi18-06.nested.lab

2. Configure password-less SSH authentication from primary node in source cluster to all other hosts.
3. Create empty Gluster volume 'dest' in secondary site.
4. Configure geo-replication of source volume 'data' to target volume 'dest' in secondary site

# gluster volume set all cluster.enable-shared-storage enable
# gluster volume set data features.shard enable
# gluster system:: execute gsec_create
# gluster volume geo-replication data jorti-rhhi18-04.nested.lab::dest create push-pem
# gluster volume geo-replication data jorti-rhhi18-04.nested.lab::dest config use_meta_volume true

5. Synchronize volume in source cluster from the RHV GUI:
Storage -> Volumes -> data -> Geo-replication -> sync

6. Create a geo-replication schedule
Storage -> Domains -> data -> Remote data sync setup -> Create daily schedule

7. Create the playbooks as explained in the documentation:
https://access.redhat.com/documentation/en-us/red_hat_hyperconverged_infrastructure_for_virtualization/1.8/html/maintaining_red_hat_hyperconverged_infrastructure_for_virtualization/config-backup-recovery#config-backup-recovery-mapping-file

8. Set the target volume in read-write:

# gluster volume set dest features.read-only off

9. Run the fail over playbook,
# ansible-playbook dr-rhv-failover.yml --tags="fail_over" -vvvv

Actual results:
The playbook fails in the task "Add Gluster storage domain" with the error below.

~~~
TASK [oVirt.disaster-recovery : Add Gluster storage domain] ***********************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************************
task path: /usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml:2
<127.0.0.1> ESTABLISH LOCAL CONNECTION FOR USER: root
<127.0.0.1> EXEC /bin/sh -c 'echo ~root && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306 && echo ansible-tmp-1598873453.7682593-166227-46267692096306="` echo /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306 `" ) && sleep 0'
Using module file /usr/lib/python3.6/site-packages/ansible/modules/cloud/ovirt/ovirt_storage_domain.py
<127.0.0.1> PUT /root/.ansible/tmp/ansible-local-166078pyuv0sh6/tmpvj9nunyl TO /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py
<127.0.0.1> EXEC /bin/sh -c 'chmod u+x /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/ /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c '/usr/libexec/platform-python /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/AnsiballZ_ovirt_storage_domain.py && sleep 0'
<127.0.0.1> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1598873453.7682593-166227-46267692096306/ > /dev/null 2>&1 && sleep 0'
The full traceback is:
Traceback (most recent call last):
  File "/tmp/ansible_ovirt_storage_domain_payload_i_ppik_4/ansible_ovirt_storage_domain_payload.zip/ansible/modules/cloud/ovirt/ovirt_storage_domain.py", line 770, in main
  File "/tmp/ansible_ovirt_storage_domain_payload_i_ppik_4/ansible_ovirt_storage_domain_payload.zip/ansible/module_utils/ovirt.py", line 623, in create
    **kwargs
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/services.py", line 26097, in add
    return self._internal_add(storage_domain, headers, query, wait)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 232, in _internal_add
    return future.wait() if wait else future
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 55, in wait
    return self._code(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 229, in callback
    self._check_fault(response)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 132, in _check_fault
    self._raise_error(response, body)
  File "/usr/lib64/python3.6/site-packages/ovirtsdk4/service.py", line 118, in _raise_error
    raise error
ovirtsdk4.Error: Fault reason is "Operation Failed". Fault detail is "[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]". HTTP response code is 400.
fatal: [localhost]: FAILED! => {
    "changed": false,
    "invocation": {
        "module_args": {
            "backup": false,
            "comment": null,
            "critical_space_action_blocker": 5,
            "data_center": "Default",
            "description": null,
            "destroy": null,
            "discard_after_delete": null,
            "domain_function": "data",
            "fcp": null,
            "fetch_nested": false,
            "format": null,
            "glusterfs": {
                "address": "jorti-rhhi18-04.nested.lab",
                "path": "/dest"
            },
            "host": "jorti-rhhi18-04.nested.lab",
            "id": null,
            "iscsi": null,
            "localfs": null,
            "managed_block_storage": null,
            "name": "dest",
            "nested_attributes": [],
            "nfs": null,
            "poll_interval": 3,
            "posixfs": null,
            "state": "present",
            "timeout": 180,
            "wait": true,
            "warning_low_space": 10,
            "wipe_after_delete": false
        }
    },
    "msg": "Fault reason is \"Operation Failed\". Fault detail is \"[Error in creating a Storage Domain. The selected storage path is not empty (probably contains another Storage Domain). Either remove the existing Storage Domain from this path, or change the Storage path).]\". HTTP response code is 400."
}
...ignoring
Read vars_file 'disaster_recovery_vars.yml'
Read vars_file 'passwords.yml'
~~~


Expected results:
The geo-replicated storage domain 'dest' is imported in the secondary cluster

Additional info:
I attach the disaster_recovery_vars.yml used.

This is the task that is failing:

~~~
/usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml


    - name: Add Gluster storage domain
      ovirt_storage_domain:
          name: "{{ gluster_storage['dr_' + dr_target_host + '_name'] }}"
          critical_space_action_blocker: "{{ gluster_storage['dr_critical_space_action_blocker'] }}"
          domain_function: "{{ gluster_storage['dr_storage_domain_type'] }}"
          warning_low_space: "{{ gluster_storage['dr_warning_low_space'] }}"
          wipe_after_delete: "{{ gluster_storage['dr_wipe_after_delete'] }}"
          backup: "{{ gluster_storage['dr_backup'] }}"
          host: "{{ ovirt_hosts[0].name }}"
          data_center: "{{ gluster_storage['dr_' + dr_target_host + '_dc_name'] }}"
          auth: "{{ ovirt_auth }}"
          glusterfs:
              path: "{{ gluster_storage['dr_' + dr_target_host + '_path'] }}"
              address: "{{ gluster_storage['dr_' + dr_target_host + '_address'] }}"
      register: result
~~~

Comment 5 Sahina Bose 2020-10-08 13:31:34 UTC
Ritesh, can you take a look?

Comment 6 Sahina Bose 2020-10-09 05:38:24 UTC
~~~
/usr/share/ansible/roles/oVirt.disaster-recovery/tasks/recover/add_glusterfs_domain.yml


    - name: Add Gluster storage domain
      ovirt_storage_domain:
          name: "{{ gluster_storage['dr_' + dr_target_host + '_name'] }}"
          critical_space_action_blocker: "{{ gluster_storage['dr_critical_space_action_blocker'] }}"
          domain_function: "{{ gluster_storage['dr_storage_domain_type'] }}"
          warning_low_space: "{{ gluster_storage['dr_warning_low_space'] }}"
          wipe_after_delete: "{{ gluster_storage['dr_wipe_after_delete'] }}"
          backup: "{{ gluster_storage['dr_backup'] }}"
          host: "{{ ovirt_hosts[0].name }}"
          data_center: "{{ gluster_storage['dr_' + dr_target_host + '_dc_name'] }}"
          auth: "{{ ovirt_auth }}"
          glusterfs:
              path: "{{ gluster_storage['dr_' + dr_target_host + '_path'] }}"
              address: "{{ gluster_storage['dr_' + dr_target_host + '_address'] }}"
      register: result
~~~

Do we need a state:imported here when we try to add gluster volume with data?

Comment 16 Gobinda Das 2020-11-18 05:16:47 UTC
Moved to wrong status, will move to ON_QA with availability of respin of RHVH for async, with 3.5.3 included

Comment 17 SATHEESARAN 2020-12-04 10:56:55 UTC
Tested with 4.4.3.12-0.1.el8ev and glusterfs-6.0-49.el8rhgs, with glusterfs-selinux package.

Geo-replication successfully syncs the data from the primary gluster volume to secondary gluster volume
using rsync as the sync-method.

Also post the sync disaster-recovery roles works good and the VMs could successfully start on the 
secondary site