Bug 1575070

Summary: Unable to configure "Failback" settings in geo-replication setup between CNS master containers/pods and non-CNS slave glusterse
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Neha Berry <nberry>
Component: rhgs-server-containerAssignee: Raghavendra Talur <rtalur>
Status: CLOSED WONTFIX QA Contact: Neha Berry <nberry>
Severity: medium Docs Contact:
Priority: unspecified    
Version: cns-3.9CC: hchiramm, madam, rhs-bugs, rtalur, sarumuga
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-11-11 20:16:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Neha Berry 2018-05-04 17:05:38 UTC
Description of problem:
---------------------------

As a user, I was able to set up Container-Native Storage volumes for geo-replication to a non-Container-Native Storage remote site. Here, the Container-Native Storage volume acted as the master volume and we were able to replicate the master data to slave volume. 

Link Used: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.2/html-single/administration_guide/#idm140421628903088

But I was unable to setup reverse geo-rep from a gluster node to a CNS gluster pod using the steps mentioned in above documentation. Thus, was not able to sync data from original slave to master after recovering from a disaster.

Hence, also wanted to confirm whether we support "Failback: Resuming Master and Slave back to their Original State" as a use case for Geo-rep Disaster Recovery in CNS environment.


Reasons for failure 
++++++++++++++++++++

For promoting slave side as master and gluster pod volumes as slave, we need to have passwordless login from slave node to the CNS gluster pod. 

But, if I perform a ssh-copy-id specifying the actual master IP, the passwordless ssh is actually established between slave node and master node, not the master POD. 

On using -p 2222 option with above command, it asks for a password of the pod/container(which we are unaware of).

Please let me know if there's a workaround and whether this use case is actually supported or not.


Version-Release number of selected component (if applicable):
------------------------

How reproducible:
------------------------
Always reproducible


Steps to Reproduce:
------------------------
1. For creating geo-rep session between CNS containers and non-CNS slave, followed the link https://access.redhat.com/solutions/2616801.
 
2. Data is replicated from master volume to slave volume and geo-rep status is "1 Active and 2 passive" as expected.
__________________________________________________________________________________________________

sh-4.2# gluster volume geo-replication  vol_a54efc0b70bff71d0e1974fc1dfcfa02 10.70.47.83::vol_ab0cbc0e4d45f0587b0418d6b7ee7f03 status
 
MASTER NODE     MASTER VOL                              MASTER BRICK                                                                                               SLAVE USER    SLAVE                                                SLAVE NODE      STATUS     CRAWL STATUS       LAST_SYNCED                  
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
10.70.46.231    vol_a54efc0b70bff71d0e1974fc1dfcfa02    /var/lib/heketi/mounts/vg_e6a24cafe798991c586f777ef27b04dd/brick_fe0ce9d5e112d12505c82e50dc24d1b4/brick    root          10.70.47.83::vol_ab0cbc0e4d45f0587b0418d6b7ee7f03    10.70.46.187    Active     Changelog Crawl    2018-05-04 04:56:35          
10.70.46.191    vol_a54efc0b70bff71d0e1974fc1dfcfa02    /var/lib/heketi/mounts/vg_83085e7f37c50c7c5565a72927f14b03/brick_6f6daecc7f3892f09c37d09b3f18d8ca/brick    root          10.70.47.83::vol_ab0cbc0e4d45f0587b0418d6b7ee7f03    10.70.47.83     Passive    N/A                N/A                          
10.70.46.113    vol_a54efc0b70bff71d0e1974fc1dfcfa02    /var/lib/heketi/mounts/vg_9c7134bc1f334181fd725f025e33599a/brick_627ca78be9c96c22eabf3efd8ba86a9d/brick    root          10.70.47.83::vol_ab0cbc0e4d45f0587b0418d6b7ee7f03    10.70.46.165    Passive    N/A                N/A                          
sh-4.2# gluster volume geo-replication  vol_a54efc0b70bff71d0e1974fc1dfcfa02 10.70.47.83::vol_ab0cbc0e4d45f0587b0418d6b7ee7f03 status
 

___________________________________________________________________________________________________

3. On inducing disaster at CNS side, performed Failover: Promoting a Slave to Master using following commands
# gluster volume set VOLNAME geo-replication.indexing on
# gluster volume set VOLNAME changelog on

4. For Failback: Resuming Master and Slave back to their Original State, tried executing following commands(as per doc), but geo-rep session couldn't be established:

Attempt #1: on slave side
++++++++++++++++++++++++++

[root@dhcp47-83 ~]# ssh-copy-id root.46.231 
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed

/usr/bin/ssh-copy-id: WARNING: All keys were skipped because they already exist on the remote system.
		(if you think this is a mistake, you may want to use -f option)

#ssh-keygen 
#ssh-copy-id root.46.231
#gluster system:: execute gsec_create
# gluster volume geo-replication  vol_ab0cbc0e4d45f0587b0418d6b7ee7f03 10.70.46.231::vol_a54efc0b70bff71d0e1974fc1dfcfa02 create push-pem
Gluster version mismatch between master and slave.
geo-replication command failed

Reason: This resulted in passwordless ssh between 2 NODES, and not between slave node and gluster pod. CNS node doesn't have the required gluster rpms, only the pods have them. 
Even using "force" option didnt help and the session created thereafter was faulty.

Attempt #2: on slave side, using port -p 2222
++++++++++++++++++++++++++++++++++++++++++++++++++
root@dhcp47-83 ~]# ssh-copy-id root.46.231 -p 2222
/usr/bin/ssh-copy-id: INFO: Source of key(s) to be installed: "/root/.ssh/id_rsa.pub"
/usr/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/usr/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
root.46.231's password: 
Permission denied, please try again.
root.46.231's password: 
Authentication failed.


Output: This asked for pod password, which is unknown to us and also not mentioned in gluster geo-rep docs..


Actual results:
---------------

Unable to establish reverse geo-rep session between standalone gluster node to CNS gluster pod.

Expected results:
---------------------
If failback is supported, it would be easier to configure if we have geo-rep documentations specific to CNS.

Additional info:
--------------------

The gluster versions between Slaves nodes and gluster pods are the same.