Description of problem: When non root account on slave system is used by SSH to setup geo-replication, the initialization fails on slave to open logfile /var/log/glusterfs/cli.log. As result geo-replication is not active and has status "Created". Version-Release number of selected component (if applicable): GlusterFS 5.2, manually installed from GlusterFS source code. How reproducible: 100% In our tests: Master system IP address: 10.10.60.182. Slave system DNS name: vn-sc-2.westus2.cloudapp.azure.com Master volume: master-volume-0007 Slave volume: slave-volume-0001 Steps to Reproduce: 1. Setup nonprivileged account on slave system. In our case account "nasgorep" was used. Test SSH from master to slave. 12592 01/04/2019 14:27:35.101560269 1546640855 command: /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd 12592 01/04/2019 14:27:35.839285578 1546640855 status=0 /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd 12640 01/04/2019 14:27:37.847050433 1546640857 command: /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1 12640 01/04/2019 14:27:38.596877738 1546640858 status=0 /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1 2. Setup geo-replication: 12592 01/04/2019 14:27:35.101560269 1546640855 command: /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd 12592 01/04/2019 14:27:35.839285578 1546640855 status=0 /usr/bin/ssh nasgorep.cloudapp.azure.com /bin/pwd 12640 01/04/2019 14:27:37.847050433 1546640857 command: /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1 12640 01/04/2019 14:27:38.596877738 1546640858 status=0 /usr/bin/ssh -q -oConnectTimeout=5 nasgorep.cloudapp.azure.com /bin/pwd 2>&1 [2019-01-04 22:27:52.449174] : system:: execute gsec_create : SUCCESS [2019-01-04 22:28:32.288278] : volume geo-replication master-volume-0007 nasgorep.cloudapp.azure.com::slave-volume-0001 create push-pem : SUCCESS [2019-01-04 22:28:39.447194] : volume geo-replication master-volume-0007 nasgorep.cloudapp.azure.com::slave-volume-0001 start : SUCCESS Actual results: Check status of the geo-replication, it is in "Created" state: [root@SC-10-10-63-182 log]# /usr/sbin/gluster volume geo-replication master-volume-0007 nasgorep.cloudapp.azure.com::slave-olume-0001 status detail MASTER NODE MASTER VOL MASTER BRICK SLAVE USER SLAVE SLAVE NODE STATUS CRAWL STATUS LAST_SYNCED ENTRY DATA META FAILURES CHECKPOINT TIME CHECKPOINT COMPLETED CHECKPOINT COMPLETION TIME ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- 10.10.60.182 master-volume-0007 /exports/master-segment-0015/master-volume-0007 nasgorep nasgorep.cloudapp.azure.com::slave-volume-0001 N/A Created N/A N/A N/A N/A N/A N/A N/A N/A N/A Expected results: state of the geo-replication has to be "Active". Additional info: On master system: geo-replication logfile "gsyncd.log" has next error messages: [root@SC-10-10-63-182 log]# vi /var/log/glusterfs/geo-replication/master-volume-0005_10.10.60.183_slave-volume-0001/gsyncd.log: ………. [2019-01-04 22:28:40.276280] E [syncdutils(monitor):809:errlog] Popen: command returned error cmd=ssh -oPasswordAuthentication=no -oStrictHostKeyChecking=no -i /var/lib/glusterd/geo-replication/secret.pem -p 22 nasgorep.cloudapp.azure.com /usr/sbin/gluster --xml --remote-host=localhost volume info slave-volume-0001 error=255 [2019-01-04 22:28:40.277328] E [syncdutils(monitor):813:logerr] Popen: ssh> ERROR: failed to create logfile "/var/log/glusterfs/cli.log" (Permission denied) [2019-01-04 22:28:40.277438] E [syncdutils(monitor):813:logerr] Popen: ssh> ERROR: failed to open logfile /var/log/glusterfs/cli.log ……. On slave: file "cli.log" has next attributes: [root@VN-SC-2 log]# ls -l /var/log/glusterfs/cli.log -rw------- 1 root root 28552 Jan 4 14:35 /var/log/glusterfs/cli.log It seems slave system does not let user "nasgorep" open or create logfile /var/log/glusterfs/cli.log" when "gluster" call is invoked from master system through SSH. On slave: command history log: [root@VN-SC-2 log]# cat /var/log/glusterfs/cmd_history.log [2019-01-04 22:22:49.893261] : volume status : SUCCESS [2019-01-04 22:22:59.929183] : volume geo-replication status : SUCCESS [2019-01-04 22:24:25.105256] : volume create slave-volume-0001 transport tcp VN-SC-2.cxn2n2xii2hepedpq4siv2akfe.xx.internal.cloudapp.net:/exports/nas-segment-0015/slave-volume-0001 : SUCCESS [2019-01-04 22:24:25.894525] : volume set slave-volume-0001 nfs.addr-namelookup off : SUCCESS [2019-01-04 22:24:26.418826] : volume reset slave-volume-0001 network.ping-timeout : SUCCESS [2019-01-04 22:24:27.086126] : volume set slave-volume-0001 nfs.disable on : SUCCESS [2019-01-04 22:24:28.082211] : volume set slave-volume-0001 performance.stat-prefetch off : SUCCESS [2019-01-04 22:24:28.873671] : volume set slave-volume-0001 performance.quick-read off : SUCCESS [2019-01-04 22:24:29.716433] : volume set slave-volume-0001 allow-insecure on : SUCCESS [2019-01-04 22:24:30.272508] : volume reset slave-volume-0001 nfs.rpc-auth-allow : SUCCESS [2019-01-04 22:24:31.303757] : volume start slave-volume-0001 : SUCCESS [2019-01-04 22:24:51.709793] : volume geo-replication status : SUCCESS [2019-01-04 22:24:52.575909] : volume status slave-volume-0001 : SUCCESS [2019-01-04 22:27:55.543280] : system:: uuid get : SUCCESS [2019-01-04 22:27:55.574006] : system:: execute mountbroker.py node-add slave-volume-0001 nasgorep : SUCCESS [2019-01-04 22:28:38.062065] : system:: copy file /geo-replication/master-volume-0007_slave-volume-0001_common_secret.pem.pub : SUCCESS [2019-01-04 22:28:38.296566] : system:: execute add_secret_pub nasgorep geo-replication/master-volume-0007_slave-volume-0001_common_secret.pem.pub : SUCCESS [2019-01-04 22:31:31.417565] : volume geo-replication slave-volume-0001 status : SUCCESS
Hi, Can you confirm that whether master and slave on same host ? - Sunny
https://review.gluster.org/22865 should fix the issue.