Bug 1194574

Summary: [geo-rep]: In a mountbroker setup, set_geo_rep_pem_keys.sh fails to copy keys to its slave peers.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Sweta Anandpara <sanandpa>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: storage-qa-internal <storage-qa-internal>
Severity: unspecified Docs Contact:
Priority: high    
Version: rhgs-3.0CC: aavati, annair, avishwan, csaba, nlevinki, nsathyan, rcyriac, vagarwal
Target Milestone: ---Keywords: TestBlocker, ZStream
Target Release: RHGS 3.0.4   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: glusterfs-3.6.0.47-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1194596 (view as bug list) Environment:
Last Closed: 2015-03-26 06:36:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1182947, 1194596    
Attachments:
Description Flags
Detailed logs none

Description Sweta Anandpara 2015-02-20 09:13:45 UTC
Description of problem:

While setting up a geo-rep mountbroker setup, a shell script (set_geo_rep_pem_keys.sh) is run on one of the slave nodes, which copies the keys to other slave nodes in the cluster. That errors out as it is not able to recognize the global variable $GLUSTERD_WORKING_DIR. 

Version-Release number of selected component (if applicable): 
3.6.0.44-1

How reproducible:
Always

Steps to Reproduce:
1. Create a mountbroker setup with the following steps. Create a group (groupadd <groupname) and a user (useradd <username> -g <groupname>) in each of the slave nodes.
e.g., groupadd geogroup
      useradd geoacc -g geogroup
2. Create a directory on all the slave nodes, owned by the root with the permissions 0711. 
e.g., mkdir /var/mountbroker-root
      chmod 711 /var/mountbroker-root
3. Add the following options to the glusterd volfile present at /etc/glusterfs/glusterd.vol
    option mountbroker-root /var/mountbroker-root
    option mountbroker-geo-replication.geoaccount <slavevolume>
    option geo-replication-log-group geogroup
    option rpc-auth-allow-insecure on
4. Restart glusterd on all the slave nodes
e.g., service glusterd restart
5. Setup a password-less ssh from the masternode to the non-root user on one of the slave node. 
e.g., ssh-keygen
      ssh-copy-id geoacc@<slavenode>
6. Create a geo-rep account from the master to the non-root useraccount on the slave
e.g., gluster volume geo-rep <mastervolume> geoacc@<slavenode>::<slavevolume> create push-pem
7. In the slavenode which is used to create relationship, run /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh as root, with username, mastervolume and slavevolume as arguments.
e.g., /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave
8. Start the geo-rep session
e.g.,   gluster volume geo-rep <mastervolume> geoacc@<slavenode>::<slavevolume> start


Actual results:

At step7, it fails with the below error: 
[root@dhcp42-130 ~]#  /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave
cp: cannot create regular file `/geo-replication/': Is a directory
Successfully copied file.
Command executed successfully.
[root@dhcp42-130 ~]#

This seems to be regression. It used to work on an earlier build, as per Shilpa.

Expected results:
The command should have passed (if it would have correctly resolved gluster_working_directory) and the following step8 would have resulted in success

Additional info:

[root@dhcp42-130 glusterfs]# rpm -qa | grep gluster
vdsm-gluster-4.14.7.3-1.el6rhs.noarch
glusterfs-3.6.0.44-1.el6rhs.x86_64
glusterfs-geo-replication-3.6.0.44-1.el6rhs.x86_64
gluster-nagios-addons-0.1.14-1.el6rhs.x86_64
samba-glusterfs-3.6.509-169.4.el6rhs.x86_64
glusterfs-api-3.6.0.44-1.el6rhs.x86_64
glusterfs-fuse-3.6.0.44-1.el6rhs.x86_64
glusterfs-server-3.6.0.44-1.el6rhs.x86_64
glusterfs-rdma-3.6.0.44-1.el6rhs.x86_64
gluster-nagios-common-0.1.4-1.el6rhs.noarch
glusterfs-libs-3.6.0.44-1.el6rhs.x86_64
glusterfs-cli-3.6.0.44-1.el6rhs.x86_64
[root@dhcp42-130 glusterfs]# 
[root@dhcp42-130 ~]#  /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave
cp: cannot create regular file `/geo-replication/': Is a directory
Successfully copied file.
Command executed successfully.
[root@dhcp42-130 ~]# bash -x /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave
+ main geoacc master slave
+ user=geoacc
+ master_vol=master
+ slave_vol=slave
+ '[' geoacc == '' ']'
+ '[' master == '' ']'
+ '[' slave == '' ']'
+ COMMON_SECRET_PEM_PUB=master_slave_common_secret.pem.pub
+ '[' geoacc == root ']'
++ getent passwd geoacc
++ cut -d : -f 6
+ home_dir=/home/geoacc
+ '[' /home/geoacc == '' ']'
+ '[' -f /home/geoacc/master_slave_common_secret.pem.pub ']'
+ cp /home/geoacc/master_slave_common_secret.pem.pub /geo-replication/
cp: cannot create regular file `/geo-replication/': Is a directory
+ gluster system:: copy file /geo-replication/master_slave_common_secret.pem.pub
Successfully copied file.
+ gluster system:: execute add_secret_pub geoacc master slave
Command executed successfully.
+ exit 0
[root@dhcp42-130 ~]#

//MASTER NODE //
[root@dhcp43-154 ~]# gluster volume geo-rep master geoacc@dhcp42-130::slave start
Starting geo-replication session between master & geoacc@dhcp42-130::slave has been successful
[root@dhcp43-154 ~]# 
[root@dhcp43-154 ~]# gluster volume geo-rep master geoacc@dhcp42-130::slave status
 
MASTER NODE                          MASTER VOL    MASTER BRICK      SLAVE USER    SLAVE                       STATUS    CHECKPOINT STATUS    CRAWL STATUS       
----------------------------------------------------------------------------------------------------------------------------------------------------------
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
dhcp43-154.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
dhcp42-74.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick1/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
dhcp43-72.lab.eng.blr.redhat.com     master        /rhs/brick2/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick1/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
dhcp42-182.lab.eng.blr.redhat.com    master        /rhs/brick2/d1    geoacc        geoacc@dhcp42-130::slave    faulty    N/A                  N/A                
[root@dhcp43-154 ~]#

Comment 1 Aravinda VK 2015-02-20 10:43:21 UTC
Upstream patch sent
http://review.gluster.org/#/c/9720

Comment 2 Aravinda VK 2015-02-25 05:54:48 UTC
Upstream needs additional step to setup Non root geo-replication. I will abandon the upstream patch since it is not relevant.

This Bug requires only downstream patch. 
https://code.engineering.redhat.com/gerrit/#/c/42612/

Comment 3 Sweta Anandpara 2015-02-27 11:42:54 UTC
Tested and verified the bug on the build 3.6.0.47-1

Followed all the steps to be done for creating a geo-rep relationship between master and non-root user in the slave. Was able to successfully create the relationship, though the status does go to faulty. Debugging more on the same, but moving the present bug to fixed in 3.0.4

Detailed logs are attached.

Comment 4 Sweta Anandpara 2015-02-27 11:43:29 UTC
Created attachment 995997 [details]
Detailed logs

Comment 6 errata-xmlrpc 2015-03-26 06:36:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html