Description of problem: While setting up a geo-rep mountbroker setup, a shell script (set_geo_rep_pem_keys.sh) is run on one of the slave nodes, which copies the keys to other slave nodes in the cluster. That errors out as it is not able to recognize the global variable $GLUSTERD_WORKING_DIR. Version-Release number of selected component (if applicable): 3.6.0.44-1 How reproducible: Always Steps to Reproduce: 1. Create a mountbroker setup with the following steps. Create a group (groupadd <groupname) and a user (useradd <username> -g <groupname>) in each of the slave nodes. e.g., groupadd geogroup useradd geoacc -g geogroup 2. Create a directory on all the slave nodes, owned by the root with the permissions 0711. e.g., mkdir /var/mountbroker-root chmod 711 /var/mountbroker-root 3. Add the following options to the glusterd volfile present at /etc/glusterfs/glusterd.vol option mountbroker-root /var/mountbroker-root option mountbroker-geo-replication.geoaccount <slavevolume> option geo-replication-log-group geogroup option rpc-auth-allow-insecure on 4. Restart glusterd on all the slave nodes e.g., service glusterd restart 5. Setup a password-less ssh from the masternode to the non-root user on one of the slave node. e.g., ssh-keygen ssh-copy-id geoacc@<slavenode> 6. Create a geo-rep account from the master to the non-root useraccount on the slave e.g., gluster volume geo-rep <mastervolume> geoacc@<slavenode>::<slavevolume> create push-pem 7. In the slavenode which is used to create relationship, run /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh as root, with username, mastervolume and slavevolume as arguments. e.g., /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave 8. Start the geo-rep session e.g., gluster volume geo-rep <mastervolume> geoacc@<slavenode>::<slavevolume> start Actual results: At step7, it fails with the below error: [root@dhcp42-130 ~]# /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave cp: cannot create regular file `/geo-replication/': Is a directory Successfully copied file. Command executed successfully. [root@dhcp42-130 ~]# This seems to be regression. It used to work on an earlier build, as per Shilpa. Expected results: The command should have passed (if it would have correctly resolved gluster_working_directory) and the following step8 would have resulted in success Additional info: [root@dhcp42-130 glusterfs]# rpm -qa | grep gluster vdsm-gluster-4.14.7.3-1.el6rhs.noarch glusterfs-3.6.0.44-1.el6rhs.x86_64 glusterfs-geo-replication-3.6.0.44-1.el6rhs.x86_64 gluster-nagios-addons-0.1.14-1.el6rhs.x86_64 samba-glusterfs-3.6.509-169.4.el6rhs.x86_64 glusterfs-api-3.6.0.44-1.el6rhs.x86_64 glusterfs-fuse-3.6.0.44-1.el6rhs.x86_64 glusterfs-server-3.6.0.44-1.el6rhs.x86_64 glusterfs-rdma-3.6.0.44-1.el6rhs.x86_64 gluster-nagios-common-0.1.4-1.el6rhs.noarch glusterfs-libs-3.6.0.44-1.el6rhs.x86_64 glusterfs-cli-3.6.0.44-1.el6rhs.x86_64 [root@dhcp42-130 glusterfs]# [root@dhcp42-130 ~]# /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave cp: cannot create regular file `/geo-replication/': Is a directory Successfully copied file. Command executed successfully. [root@dhcp42-130 ~]# bash -x /usr/libexec/glusterfs/set_geo_rep_pem_keys.sh geoacc master slave + main geoacc master slave + user=geoacc + master_vol=master + slave_vol=slave + '[' geoacc == '' ']' + '[' master == '' ']' + '[' slave == '' ']' + COMMON_SECRET_PEM_PUB=master_slave_common_secret.pem.pub + '[' geoacc == root ']' ++ getent passwd geoacc ++ cut -d : -f 6 + home_dir=/home/geoacc + '[' /home/geoacc == '' ']' + '[' -f /home/geoacc/master_slave_common_secret.pem.pub ']' + cp /home/geoacc/master_slave_common_secret.pem.pub /geo-replication/ cp: cannot create regular file `/geo-replication/': Is a directory + gluster system:: copy file /geo-replication/master_slave_common_secret.pem.pub Successfully copied file. + gluster system:: execute add_secret_pub geoacc master slave Command executed successfully. + exit 0 [root@dhcp42-130 ~]# //MASTER NODE // [root@dhcp43-154 ~]# gluster volume geo-rep master geoacc@dhcp42-130::slave start Starting geo-replication session between master & geoacc@dhcp42-130::slave has been successful [root@dhcp43-154 ~]# [root@dhcp43-154 ~]# gluster volume geo-rep master geoacc@dhcp42-130::slave status MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE STATUS CHECKPOINT STATUS CRAWL STATUS ---------------------------------------------------------------------------------------------------------------------------------------------------------- dhcp43-154.lab.eng.blr.redhat.com master /rhs/brick1/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A dhcp43-154.lab.eng.blr.redhat.com master /rhs/brick2/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A dhcp42-74.lab.eng.blr.redhat.com master /rhs/brick1/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A dhcp42-74.lab.eng.blr.redhat.com master /rhs/brick2/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A dhcp43-72.lab.eng.blr.redhat.com master /rhs/brick1/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A dhcp43-72.lab.eng.blr.redhat.com master /rhs/brick2/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A dhcp42-182.lab.eng.blr.redhat.com master /rhs/brick1/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A dhcp42-182.lab.eng.blr.redhat.com master /rhs/brick2/d1 geoacc geoacc@dhcp42-130::slave faulty N/A N/A [root@dhcp43-154 ~]#
Upstream patch sent http://review.gluster.org/#/c/9720
Upstream needs additional step to setup Non root geo-replication. I will abandon the upstream patch since it is not relevant. This Bug requires only downstream patch. https://code.engineering.redhat.com/gerrit/#/c/42612/
Tested and verified the bug on the build 3.6.0.47-1 Followed all the steps to be done for creating a geo-rep relationship between master and non-root user in the slave. Was able to successfully create the relationship, though the status does go to faulty. Debugging more on the same, but moving the present bug to fixed in 3.0.4 Detailed logs are attached.
Created attachment 995997 [details] Detailed logs
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0682.html